Mar 23|OASB v0.3.0 -- product-agnostic adapter interface, arp-guard vs llm-guard benchmark

Open Agent Security Benchmark

97,127 hosts scanned.
One benchmark emerged.

OASB is the open standard for AI agent security — 46 controls, 3 maturity levels, built from real-world data.

$npx hackmyagent secure --benchmark oasb-1

View the 46 controls

L1 Essential

L2 Standard

L3 Hardened

Internet-wide scan data

The current state of AI agent security

HackMyAgent scanned the public internet for exposed AI agent infrastructure. The results informed which OASB controls matter most.

97,127

Hosts discovered

11,192

Hosts scanned

1,594

Vulnerable

1,190

CLAUDE.md exposed

645

MCP tools exposed

5,042

Outdated endpoints

Read the full research report

Three measurement systems

One benchmark, three specifications

OASB-1Compliance

Check agent compliance

CIS Benchmarks for AI agents. 46 controls across 10 categories with L1/L2/L3 maturity levels. Answers: “Is your agent secure?”

Tests46 controls, L1/L2/L3

Analogous toCIS Benchmarks

AudienceAgent developers, compliance teams

View OASB-1 specification

OASB-2Governance

Govern agent behavior

Behavioral governance for AI agents. 72 controls across 9 domains with 4 agent tiers. Answers: “Does your agent behave correctly?”

Tests72 controls, 4 tiers

Analogous toSOC 2 Trust Principles

AudienceAgent builders, governance teams

View OASB-2 specification

OASB EvalEvaluation

Evaluate security tools

MITRE ATT&CK Evaluations for AI agent security tools. 222 attack scenarios across 10 MITRE ATLAS techniques. Answers: “Does your EDR catch this?”

Tests222 attack scenarios

Analogous toMITRE ATT&CK Evaluations

AudienceSecurity tool vendors, evaluators

Explore OASB Eval

Security controls

46 controls across 10 categories

Identity & Auth

4 controls

Authorization

4 controls

Input Security

4 controls

Output Security

3 controls

Credentials

4 controls

Supply Chain

4 controls

Isolation

3 controls

Memory & Context

3 controls

Monitoring & Ops

3 controls

Agent-to-Agent

14 controls

View all 46 controls

Open-source toolkit

Every control maps to a free tool

Scan, fix, and verify compliance without vendor lock-in. All tools available at opena2a.org.

HackMyAgent

238 security checks + attack simulation

InputOutputSupply ChainMemoryOperations

npx hackmyagent secure

Secretless AI

Credential protection for AI tools

Credential Protection

npx secretless-ai init

AIM

Cryptographic identity and trust scoring

IdentityAuthorizationAgent-to-AgentMonitoring

opena2a identity create

Browser Guard

Detect and control browser-based AI agents

DetectionBrowser Security

Chrome Web Store

DVAA

Vulnerable AI agent for security training

All 10 categories

docker compose up

OpenA2A CLI

Orchestrates all tools from one command

All 10 categories

npx opena2a-cli review

Scanner Leaderboard

89.2% F1. Verified accuracy for skill scanners.

4,245 ground-truth labeled samples. 9 attack categories. While other scanners report flag rates from 3.8% to 41.9% with 0.12% consensus, OASB provides verified precision and recall.

#	Scanner	F1	Precision	Recall	FPR
1	NanoMind TME v0.5.0	89.2%	88.4%	90.0%	0.82%
2	HMA Full Pipeline	81.3%	68.5%	100%	3.20%
3	HMA Static (regex)	67.5%	99.3%	51.1%	0.03%

Full leaderboard and methodologyDataset: v2.0 | April 2026

Your security team will ask what standard you are using.

Send them here.

OASB Eval

Verify your agent's security

Run the benchmark against your AI agent. Read the docs for CI/CD integration.

$npx hackmyagent secure --benchmark oasb-1

97,127 hosts scanned.One benchmark emerged.

The current state of AI agent security

One benchmark, three specifications

46 controls across 10 categories

Every control maps to a free tool

HackMyAgent

Secretless AI

AIM

Browser Guard

DVAA

OpenA2A CLI

89.2% F1. Verified accuracy for skill scanners.

Verify your agent's security

97,127 hosts scanned.
One benchmark emerged.