Open Agent Security Benchmark
The standard for
AI agent security
Three measurement systems. OASB-1 checks infrastructure security — 46 controls, 3 maturity levels. OASB-2 checks behavioral governance — 72 controls, 9 domains, 4 agent tiers. OASB Eval tests whether security products detect real attacks — 182 scenarios, 10 MITRE ATLAS techniques.
npx hackmyagent secure --benchmark oasb-1Three measurement systems
One benchmark, three specifications
Check agent compliance
CIS Benchmarks for AI agents. 46 controls across 10 categories with L1/L2/L3 maturity levels. Answers: “Is your agent secure?”
Govern agent behavior
Behavioral governance for AI agents. 72 controls across 9 domains with 4 agent tiers. Answers: “Does your agent behave correctly?”
Evaluate security products
MITRE ATT&CK Evaluations for AI agent security products. 182 attack scenarios across 10 MITRE ATLAS techniques. Answers: “Does your EDR catch this?”
Protects against
Common AI agent vulnerabilities
Prompt injection
Malicious inputs that override system instructions, causing agents to ignore safety rules or execute unintended actions.
Jailbreaking
Techniques that bypass model guardrails through roleplay, encoding tricks, or adversarial prompts.
Data exfiltration
Extracting sensitive information through crafted queries, side channels, or manipulated tool outputs.
Credential theft
Exposing API keys, tokens, or secrets through prompt leaks, logs, or unprotected storage.
Tool abuse
Misusing agent capabilities beyond intended scope - file access, network calls, or code execution.
Context poisoning
Manipulating conversation history or RAG sources to influence future agent behavior.
Internet-wide scan data
The current state of AI agent security
HackMyAgent scanned the public internet for exposed AI agent infrastructure. The results informed which OASB controls matter most.
97,127
Hosts discovered
11,192
Hosts scanned
1,594
Vulnerable
1,190
CLAUDE.md exposed
645
MCP tools exposed
5,042
Outdated endpoints
Maturity levels
Choose your security posture
Baseline security every agent should have. Development and prototypes.
Production agents handling sensitive data. Most deployments.
Regulated industries and high-security environments.
Security controls
46 controls across 10 categories
Open-source toolkit
Meet the requirements with open-source tools
Every OASB control maps to a free, open-source tool. Scan, fix, and verify compliance without vendor lock-in.
OpenA2A CLI
Unified security CLI that orchestrates all OpenA2A tools. Scan, protect, guard, and verify agents from a single command.
npx opena2aHackMyAgent
Security scanner and red team toolkit. 147 checks, 55 attack payloads, auto-fix with rollback. Includes ARP runtime protection and OASB benchmarking.
npx hackmyagent secureSecretless AI
Keeps secrets out of AI context windows. PreToolUse hooks block credential access across Claude Code, Cursor, and Copilot.
npx secretless-ai initAIM
Agent Identity Management -- cryptographic identity, capability enforcement, audit logging, and trust scoring for AI agents.
pip install aim-sdkBrowser Guard
Chrome extension that detects and controls AI agents operating in the browser. Four-layer detection, delegation engine, and session timeline.
Chrome Web StoreDVAA
Damn Vulnerable AI Agent -- 10 intentionally vulnerable agents for training. Practice all 10 OASB categories in a safe environment.
docker run -p 3000:3000 opena2a/damn-vulnerable-ai-agentRegistry
Agent trust registry for the OpenA2A ecosystem. Discover, verify, and track trust scores for AI agents and security tools.
registry.opena2a.orgVerify your agent's security
Run the benchmark against your AI agent. Read the docs for CI/CD integration.
npx hackmyagent secure --benchmark oasb-1↗