Build Your Own OASB Adapter: Benchmark Any Security Product in 30 Minutes

OpenA2A Team|
oasbadapterbenchmarktutorial

OASB (Open Agent Security Benchmark) evaluates AI agent security products with 222 standardized attack scenarios. But it is not tied to any specific product. Any vendor can implement the adapter interface, run the same test suite, and get a detection coverage scorecard.

This walkthrough shows you how to build an adapter for your own security product. By the end, you will have a working benchmark that produces a scorecard like this:

Test Files  47 passed (47)
     Tests  194 passed | 28 failed (222)

Product:     my-product v1.0.0
Score:       87.4%
AI-layer:    13/40 (32.5%)
Infrastructure: 181/182 (99.5%)

Prerequisites

  • Node.js 18+
  • Your security product accessible as a Node.js module (npm package, local path, or API client)
  • 30 minutes

Step 1: Clone and install OASB

git clone https://github.com/opena2a-org/oasb.git
cd oasb
npm install

Step 2: Understand the adapter interface

Open src/harness/adapter.ts. The key interface is SecurityProductAdapter:

interface SecurityProductAdapter {
  // Lifecycle
  getCapabilities(): CapabilityMatrix;
  start(): Promise<void>;
  stop(): Promise<void>;

  // Event injection (for tests that simulate attacks)
  injectEvent(event): Promise<SecurityEvent>;
  waitForEvent(predicate, timeout): Promise<SecurityEvent>;

  // Event collection (for assertions)
  getEvents(): SecurityEvent[];
  getEventsByCategory(category): SecurityEvent[];
  getEnforcements(): EnforcementResult[];
  getEnforcementsByAction(action): EnforcementResult[];
  resetCollector(): void;

  // Sub-component access
  getEventEngine(): EventEngine;
  getEnforcementEngine(): EnforcementEngine;

  // Factory methods (for component-level tests)
  createPromptScanner(): PromptScanner;
  createMCPScanner(allowedTools?): MCPScanner;
  createA2AScanner(trustedAgents?): A2AScanner;
  createPatternScanner(): PatternScanner;
  createBudgetManager(dataDir, config?): BudgetManager;
  createAnomalyScorer(): AnomalyScorer;
}

You do not need to implement every method. If your product only does prompt scanning, return no-op implementations for MCP, A2A, and infrastructure methods. The getCapabilities() method tells OASB which tests are applicable vs N/A.

Step 3: Declare your capabilities

// my-adapter.ts
import type { SecurityProductAdapter, CapabilityMatrix } from './adapter';

export class MyProductAdapter implements SecurityProductAdapter {
  getCapabilities(): CapabilityMatrix {
    return {
      product: 'my-product',
      version: '1.0.0',
      capabilities: new Set([
        'prompt-input-scanning',
        'prompt-output-scanning',
        // Only list what your product actually does
      ]),
    };
  }
  // ...
}

Available capabilities: process-monitoring, network-monitoring, filesystem-monitoring, prompt-input-scanning, prompt-output-scanning, mcp-scanning, a2a-scanning, anomaly-detection, budget-management, enforcement-*, pattern-scanning, event-correlation.

Step 4: Implement the scanners

The most important factory methods are the scanners. Each returns an object with start(), stop(), and a scan method that returns ScanResult:

interface ScanResult {
  detected: boolean;
  matches: Array<{
    pattern: {
      id: string;        // e.g. "PI-001"
      category: string;  // e.g. "prompt-injection"
      description: string;
      severity: 'medium' | 'high' | 'critical';
    };
    matchedText: string;
  }>;
}

// Example: wrap your product's scan function
createPromptScanner(): PromptScanner {
  return {
    start: async () => {},
    stop: async () => {},
    scanInput: (text: string): ScanResult => {
      const threats = myProduct.analyze(text);
      return {
        detected: threats.length > 0,
        matches: threats.map(t => ({
          pattern: {
            id: t.ruleId,
            category: t.type,
            description: t.message,
            severity: t.severity,
          },
          matchedText: t.match,
        })),
      };
    },
    scanOutput: (text: string): ScanResult => {
      // Same pattern for output scanning
    },
  };
}

Step 5: Register your adapter

Add your adapter to src/harness/create-adapter.ts:

import { MyProductAdapter } from './my-adapter';

switch (adapterName) {
  case 'arp':
    AdapterClass = ArpWrapper;
    break;
  case 'my-product':
    AdapterClass = MyProductAdapter;
    break;
  // ...
}

Step 6: Run the benchmark

OASB_ADAPTER=my-product npm test

That is it. You get a scorecard showing pass/fail for all 222 tests, broken down by category: process, network, filesystem, AI-layer, intelligence, enforcement, integration, baseline, and E2E.

Reading the scorecard

The raw score (e.g., 194/222) can overstate capability. Infrastructure tests (process, network, filesystem monitoring) pass for products that lack those capabilities because the adapter handles event injection via stubs.

The honest comparison is the AI-layer score — how many of the 40 AI-layer tests pass. This tests actual detection capability: prompt injection, jailbreak, data exfiltration, MCP exploitation, and A2A attacks.

ProductRaw ScoreAI-LayerCapabilities
arp-guard222/222 (100%)40/4015/16
llm-guard194/222 (87.4%)13/402/16
rebuff194/222 (87.4%)13/402/16

Reference adapters

OASB ships with three built-in adapters you can study:

  • arp-wrapper.ts — Full-stack adapter (ARP/HackMyAgent). Uses lazy require('arp-guard') for zero-cost import when not selected.
  • llm-guard-wrapper.ts — Prompt scanner only. Shows how to map a simple regex library to the OASB interface.
  • rebuff-wrapper.ts — Heuristic scanner. Shows how to wrap a similarity-based detector.

Submit your results

Once you have a working adapter, submit a PR to the OASB repository. We will add your product to the public scorecard on oasb.ai/eval. Same tests, same scorecard, transparent comparison.