Documentation

Getting started

Verify your AI agent's security using the OASB-1 benchmark. OASB (Open Agent Security Benchmark) provides a standardized framework for assessing AI agent security posture.

Quick start

Run the benchmark with a single command. Requires Node.js 18+.

Terminal
npx hackmyagent secure --benchmark oasb-1

Maturity levels

Choose the level appropriate for your deployment stage.

L1Essential

Baseline security for development and prototypes. Every AI agent should meet these requirements.

L2Standard

Production-ready security for agents handling sensitive data. Recommended for most deployments.

L3Hardened

Maximum security for regulated industries, financial services, and high-value targets.

Specify level
npx hackmyagent secure --benchmark oasb-1 --level L2

Output formats

Export results for integration with your tools.

Format
Flag
Use case
Text
-f text
Terminal output (default)
JSON
-f json
Programmatic access, dashboards
SARIF
-f sarif
GitHub Security tab, IDE integration
HTML
-f html
Shareable reports
Export to file
npx hackmyagent secure -b oasb-1 -f sarif -o results.sarif

CI/CD integration

Add security verification to your deployment pipeline.

GitHub Actions

.github/workflows/security.yml
name: OASB Security Benchmark

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run OASB-1 Benchmark
        run: |
          npx hackmyagent secure \
            --benchmark oasb-1 \
            --format sarif \
            --output results.sarif \
            --fail-below 80

      - name: Upload to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results.sarif

Fail threshold

Use --fail-below to set a minimum compliance score. Exits with code 1 if below threshold.

Require 80% compliance
npx hackmyagent secure -b oasb-1 --fail-below 80

Sample output

Example result for a compliant L1 agent.

OASB-1 Security Benchmark v1.0.0
════════════════════════════════════════════════════════════

Target:     https://api.example.com/v1/agent
Level:      L1 (Essential)
Rating:     COMPLIANT
Score:      100% (14/14 controls passed)

Category Results
────────────────────────────────────────────────────────────
  [PASS] Identity & Provenance      2/2   100%
  [PASS] Authorization & Access     2/2   100%
  [PASS] Input Security            3/3   100%
  [PASS] Output Security           2/2   100%
  [PASS] Credential Management     2/2   100%
  [PASS] Supply Chain              2/2   100%
  [PASS] Isolation                 1/1   100%

Recommendation: Ready for L2 assessment

CLI reference

Option
Description
-b, --benchmark
Benchmark to run (e.g., oasb-1)
-l, --level
Maturity level: L1, L2, or L3
-f, --format
Output format: text, json, sarif, html
-o, --output
Write output to file
--fail-below
Exit 1 if score below threshold (0-100)
-v, --verbose
Show detailed output

Resources