Documentation

Getting started

Verify your AI agent's security using the OASB-1 benchmark. OASB (Open Agent Security Benchmark) provides a standardized framework for assessing AI agent security posture.

Quick start

Run the benchmark with a single command. Requires Node.js 18+.

Terminal

npx hackmyagent secure --benchmark oasb-1

Maturity levels

Choose the level appropriate for your deployment stage.

L1Essential

Baseline security for development and prototypes. Every AI agent should meet these requirements.

L2Standard

Production-ready security for agents handling sensitive data. Recommended for most deployments.

L3Hardened

Maximum security for regulated industries, financial services, and high-value targets.

Specify level

npx hackmyagent secure --benchmark oasb-1 --level L2

Output formats

Export results for integration with your tools.

Format

Flag

Use case

Text

-f text

Terminal output (default)

JSON

-f json

Programmatic access, dashboards

SARIF

-f sarif

GitHub Security tab, IDE integration

HTML

-f html

Shareable reports

Export to file

npx hackmyagent secure -b oasb-1 -f sarif -o results.sarif

CI/CD integration

Add security verification to your deployment pipeline.

GitHub Actions

.github/workflows/security.yml

name: OASB Security Benchmark

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run OASB-1 Benchmark
        run: |
          npx hackmyagent secure \
            --benchmark oasb-1 \
            --format sarif \
            --output results.sarif \
            --fail-below 80

      - name: Upload to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results.sarif

Fail threshold

Use --fail-below to set a minimum compliance score. Exits with code 1 if below threshold.

Require 80% compliance

npx hackmyagent secure -b oasb-1 --fail-below 80

Sample output

Example result for a compliant L1 agent.

OASB-1 Security Benchmark v1.0.0
════════════════════════════════════════════════════════════

Target:     https://api.example.com/v1/agent
Level:      L1 (Essential)
Rating:     COMPLIANT
Score:      100% (14/14 controls passed)

Category Results
────────────────────────────────────────────────────────────
  [PASS] Identity & Provenance      2/2   100%
  [PASS] Authorization & Access     2/2   100%
  [PASS] Input Security            3/3   100%
  [PASS] Output Security           2/2   100%
  [PASS] Credential Management     2/2   100%
  [PASS] Supply Chain              2/2   100%
  [PASS] Isolation                 1/1   100%

Recommendation: Ready for L2 assessment

CLI reference

Option

Description

-b, --benchmark

Benchmark to run (e.g., oasb-1)

-l, --level

Maturity level: L1, L2, or L3

-f, --format

Output format: text, json, sarif, html

-o, --output

Write output to file

--fail-below

Exit 1 if score below threshold (0-100)

-v, --verbose

Show detailed output

Resources

HackMyAgent

Security scanner CLI for AI agents

DVAA

Vulnerable agents for testing

Internet-Wide Scan

97K hosts scanned, 1,594 vulnerable