Build Your Own OASB Adapter: Benchmark Any Security Product in 30 Minutes
OASB (Open Agent Security Benchmark) evaluates AI agent security products with 222 standardized attack scenarios. But it is not tied to any specific product. Any vendor can implement the adapter interface, run the same test suite, and get a detection coverage scorecard.
This walkthrough shows you how to build an adapter for your own security product. By the end, you will have a working benchmark that produces a scorecard like this:
Test Files 47 passed (47)
Tests 194 passed | 28 failed (222)
Product: my-product v1.0.0
Score: 87.4%
AI-layer: 13/40 (32.5%)
Infrastructure: 181/182 (99.5%)Prerequisites
- Node.js 18+
- Your security product accessible as a Node.js module (npm package, local path, or API client)
- 30 minutes
Step 1: Clone and install OASB
git clone https://github.com/opena2a-org/oasb.git cd oasb npm install
Step 2: Understand the adapter interface
Open src/harness/adapter.ts. The key interface is SecurityProductAdapter:
interface SecurityProductAdapter {
// Lifecycle
getCapabilities(): CapabilityMatrix;
start(): Promise<void>;
stop(): Promise<void>;
// Event injection (for tests that simulate attacks)
injectEvent(event): Promise<SecurityEvent>;
waitForEvent(predicate, timeout): Promise<SecurityEvent>;
// Event collection (for assertions)
getEvents(): SecurityEvent[];
getEventsByCategory(category): SecurityEvent[];
getEnforcements(): EnforcementResult[];
getEnforcementsByAction(action): EnforcementResult[];
resetCollector(): void;
// Sub-component access
getEventEngine(): EventEngine;
getEnforcementEngine(): EnforcementEngine;
// Factory methods (for component-level tests)
createPromptScanner(): PromptScanner;
createMCPScanner(allowedTools?): MCPScanner;
createA2AScanner(trustedAgents?): A2AScanner;
createPatternScanner(): PatternScanner;
createBudgetManager(dataDir, config?): BudgetManager;
createAnomalyScorer(): AnomalyScorer;
}You do not need to implement every method. If your product only does prompt scanning, return no-op implementations for MCP, A2A, and infrastructure methods. The getCapabilities() method tells OASB which tests are applicable vs N/A.
Step 3: Declare your capabilities
// my-adapter.ts
import type { SecurityProductAdapter, CapabilityMatrix } from './adapter';
export class MyProductAdapter implements SecurityProductAdapter {
getCapabilities(): CapabilityMatrix {
return {
product: 'my-product',
version: '1.0.0',
capabilities: new Set([
'prompt-input-scanning',
'prompt-output-scanning',
// Only list what your product actually does
]),
};
}
// ...
}Available capabilities: process-monitoring, network-monitoring, filesystem-monitoring, prompt-input-scanning, prompt-output-scanning, mcp-scanning, a2a-scanning, anomaly-detection, budget-management, enforcement-*, pattern-scanning, event-correlation.
Step 4: Implement the scanners
The most important factory methods are the scanners. Each returns an object with start(), stop(), and a scan method that returns ScanResult:
interface ScanResult {
detected: boolean;
matches: Array<{
pattern: {
id: string; // e.g. "PI-001"
category: string; // e.g. "prompt-injection"
description: string;
severity: 'medium' | 'high' | 'critical';
};
matchedText: string;
}>;
}
// Example: wrap your product's scan function
createPromptScanner(): PromptScanner {
return {
start: async () => {},
stop: async () => {},
scanInput: (text: string): ScanResult => {
const threats = myProduct.analyze(text);
return {
detected: threats.length > 0,
matches: threats.map(t => ({
pattern: {
id: t.ruleId,
category: t.type,
description: t.message,
severity: t.severity,
},
matchedText: t.match,
})),
};
},
scanOutput: (text: string): ScanResult => {
// Same pattern for output scanning
},
};
}Step 5: Register your adapter
Add your adapter to src/harness/create-adapter.ts:
import { MyProductAdapter } from './my-adapter';
switch (adapterName) {
case 'arp':
AdapterClass = ArpWrapper;
break;
case 'my-product':
AdapterClass = MyProductAdapter;
break;
// ...
}Step 6: Run the benchmark
OASB_ADAPTER=my-product npm test
That is it. You get a scorecard showing pass/fail for all 222 tests, broken down by category: process, network, filesystem, AI-layer, intelligence, enforcement, integration, baseline, and E2E.
Reading the scorecard
The raw score (e.g., 194/222) can overstate capability. Infrastructure tests (process, network, filesystem monitoring) pass for products that lack those capabilities because the adapter handles event injection via stubs.
The honest comparison is the AI-layer score — how many of the 40 AI-layer tests pass. This tests actual detection capability: prompt injection, jailbreak, data exfiltration, MCP exploitation, and A2A attacks.
| Product | Raw Score | AI-Layer | Capabilities |
|---|---|---|---|
| arp-guard | 222/222 (100%) | 40/40 | 15/16 |
| llm-guard | 194/222 (87.4%) | 13/40 | 2/16 |
| rebuff | 194/222 (87.4%) | 13/40 | 2/16 |
Reference adapters
OASB ships with three built-in adapters you can study:
arp-wrapper.ts— Full-stack adapter (ARP/HackMyAgent). Uses lazyrequire('arp-guard')for zero-cost import when not selected.llm-guard-wrapper.ts— Prompt scanner only. Shows how to map a simple regex library to the OASB interface.rebuff-wrapper.ts— Heuristic scanner. Shows how to wrap a similarity-based detector.
Submit your results
Once you have a working adapter, submit a PR to the OASB repository. We will add your product to the public scorecard on oasb.ai/eval. Same tests, same scorecard, transparent comparison.