Skip to content

A native stream-json Claude adapter; ACP is no longer the only rich path

Claude can be driven as a Tracing-capable Harness directly over the Claude Code CLI's --output-format stream-json, with no ACP and no Node bridge. The new claude-code-stream harness runs claude -p --output-format stream-json --verbose --dangerously-skip-permissions and translates the first-party streamed events (assistant tool_use blocks, tool_results, final usage/cost) into the framework's Trace. This revisits ADR 0003's "ACP as the single rich adapter" — not by removing ACP, but by demoting it from the rich layer to one of several, behind the Trace contract.

Status

accepted (revisits ADR 0003)

Context

ADR 0003 routed every Tracing/Interaction-capable agent through one ACP adapter, including Claude via the claude-agent-acp npm bridge. In practice ACP is a lowest-common-denominator protocol that some agents implement awkwardly or unstably (BENCHMARK.md already records droid raising an "internal ACP error"), and for Claude it is strictly less direct than the vendor's own interfaces. ADR 0003 itself anticipated this: it listed "native claude-agent-sdk for Claude" as the max-fidelity alternative and noted a native adapter "may be added later without changing the Trace or Interaction contracts."

The key observation is that ACP bundles two capabilities the framework already models separately: Tracing (observe tool calls / tokens / cost) and Interaction (answer the agent's mid-run permission requests — the bidirectional part). Every Case in the suite uses interaction: auto-approve. Auto-approval needs no protocol — it is a launch flag. True bidirectional interaction is needed only by scripted / llm-based / manual policies, which only observed-droid exercises.

So for the common case (autonomous run + observe the result), the bidirectional nature is unnecessary. What is needed is (a) headless autonomy and (b) a Trace — both of which the Claude CLI provides first-party.

Decision

Add claude-code-stream:

  • Autonomy without a protocol--dangerously-skip-permissions runs Claude fully headless, so no Interaction channel is required (Capabilities(tracing=True, interaction=False)). The runner already degrades an auto-approve Interaction request to "use the harness default" with a benign warning — and the harness default here is approve, so the existing trace-grading cases run unchanged.
  • Observation from stream-json — the streamed events are replayed into the Trace (message / thought / tool_call / tool_result / usage / stop), with Claude tool names mapped to portable Tool Kinds. The final result event fills output / cost / tokens.

The Trace stays the contract (CONTEXT.md). ACP remains for agents where it is the best or only rich path (e.g. droid). Cases that genuinely need bidirectional interaction keep using a bidirectional adapter; the natural future home for that on Claude is the Claude Agent SDK's canUseTool callback, not ACP.

Considered options

  • Keep everything on ACP (ADR 0003). Uniform, but inherits each agent's ACP quality and, for Claude, needs the Node bridge and is less direct than the CLI/SDK. Rejected as the universal mechanism.
  • CLI stream-json (chosen now). First-party, no Node, gives per-tool-call events; the result line carries cost + token usage. Output-only fallback if a future CLI drops the rollup. Lowest-effort path to a stable, native Claude Trace.
  • Claude Agent SDK adapter (planned next). Strictly richer: structured SDKResultMessage with cost/usage and a native canUseTool permission callback — the right home for bidirectional Claude cases (scripted/llm-based). Heavier; deferred until an interaction case needs it.

Consequences

  • A no-Node, native Claude Harness that satisfies the suite's tracing: true + trace-grader cases; claude-code (output-only, final-JSON envelope) is kept for cheap, trace-free runs.
  • Model selection is a plain --model flag (e.g. claude-opus-4-8, claude-opus-4-6), so comparing Claude versions needs no per-agent model registry.
  • Bidirectional interaction for Claude is intentionally out of scope here; when a case needs it, add the SDK adapter (its canUseTool maps onto the existing Interaction Policy).