Skip to content

Parallel-safe run store and pluggable sandbox isolation

To support running Cells in parallel and in isolated worktrees, the per-Cell result.json is the source of truth (written only by that Cell's own worker) and the run manifest is a derived index merged from those files. This replaces the original design where a single manifest.json was rewritten after every Cell — which races and loses updates under concurrent writers.

Sandboxes are created via a pluggable Isolation Modecopy, clone, or worktree — inferred from the source type (local folder / remote repo+commit / local git repo) and overridable. Worktree is opt-in convenience, not mandatory.

Status

accepted

Considered options

  • Single shared manifest as source of truth (original). Simple for sequential runs; unsafe under parallelism (last-writer-wins clobbers concurrent Cell updates).
  • Per-Cell result files + derived manifest (chosen). No two workers write the same file; the manifest is rebuildable at any time; resume reads per-Cell files.

Consequences

  • Resume and reporting read per-Cell result.json; the manifest can always be regenerated.
  • Cells must be fully self-contained: own Sandbox, explicit subprocess cwd + env (never os.chdir or mutating os.environ), per-Cell Trace sink, no shared mutable globals.
  • A global semaphore caps concurrent Cells; the Judge and Responder LLMs share a rate limiter. manual interaction forces concurrency = 1.
  • Worktree creation is serialized (git locks .git/worktrees/index); execution runs parallel. Worktrees are torn down after the Cell unless flagged keep, with crash-safe cleanup.
  • Port-binding MCP artifacts must use ephemeral ports (or stdio) to avoid collisions across parallel Cells.