Product Evaluation Suite
OASB Eval
MITRE ATT&CK Evaluations, but for AI agent security products.
182 standardized attack scenarios across 10 MITRE ATLAS techniques. Test whether runtime guards, EDRs, and security products detect real AI agent attacks — process spawning, network exfiltration, filesystem manipulation, and multi-step campaigns.
Different tools, different jobs
OASB Eval vs HackMyAgent
OASB Eval evaluates security products. HackMyAgent pentests agents. They complement each other but serve different audiences.
| OASB Eval | HackMyAgent | |
|---|---|---|
| Purpose | Evaluate security products | Pentest AI agents |
| Tests | "Does your EDR catch this?" | "Is your agent leaking?" |
| Audience | Security vendors, evaluators | Agent developers, red teams |
| Analogous to | MITRE ATT&CK Evaluations | OWASP ZAP / Burp Suite |
182 attack scenarios
Test categories
Each category targets a specific detection surface. Tests range from basic OS-level operations to complex multi-step attack campaigns.
33
Multi-step
28
Filesystem
25
Process
21
Intelligence
20
Network
18
Enforcement
14
Real OS
14
App hooks
13
Baseline
182 total scenarios across 9 categories
Framework alignment
MITRE ATLAS coverage
All 182 scenarios map to 10 MITRE ATLAS techniques, providing standardized coverage across known AI/ML attack vectors.
| Technique | ATLAS ID | Description |
|---|---|---|
| Reconnaissance | AML.T0013 | Discover ML model artifacts, configurations, and deployment details |
| Resource Development | AML.T0017 | Develop adversarial ML capabilities and infrastructure |
| Initial Access | AML.T0019 | Gain initial entry to ML systems via prompt injection, API abuse |
| ML Attack Staging | AML.T0040 | Prepare and stage attacks against ML models and pipelines |
| Execution | AML.T0041 | Execute adversarial actions through model inference, tool calls |
| Persistence | AML.T0042 | Maintain access via poisoned models, backdoored pipelines |
| Privilege Escalation | AML.T0043 | Escalate from model context to system-level access |
| Defense Evasion | AML.T0044 | Evade detection through adversarial examples, model manipulation |
| Exfiltration | AML.T0024 | Extract training data, model weights, or sensitive context |
| Impact | AML.T0029 | Denial of service, model degradation, integrity compromise |
Get started
Run the evaluation
Clone the repository, install dependencies, and run the full test suite against your security product.
# Clone and install
git clone https://github.com/opena2a-org/oasb.git cd oasb npm install
# Run the full evaluation suite
npm test
# Run specific category
npm test -- --grep "process" npm test -- --grep "network" npm test -- --grep "multi-step"
Transparency
Known detection gaps
No security product catches everything. OASB Eval is designed to surface these gaps transparently so vendors and users can make informed decisions.
Multi-step campaigns
Attacks that chain multiple benign operations into malicious sequences are difficult to detect at the individual step level.
Application-level hooks
Runtime monitors operating at the OS level may miss attacks that exploit application-layer APIs and SDKs.
Encrypted exfiltration
Data exfiltration over encrypted channels (HTTPS, DNS-over-HTTPS) often bypasses network-level monitors.
Living-off-the-land
Attacks using legitimate system tools (curl, python, node) are inherently harder to distinguish from normal operations.
Run the evaluation
Test your security product against 182 attack scenarios. View the repository for setup instructions.
git clone ... && npm test>