Pluggable provisioning + a backend-neutral Executor (toward containers)¶
A Case's environment block now selects a provisioner by kind, and all provisioning
commands run through a Cell Executor rather than calling subprocess directly. This
generalizes the previously pip-only environment to other ecosystems and establishes the
single seam a container backend will swap in behind — without rewriting the provisioning
recipes, the graders, or the runner.
Status¶
accepted
Context¶
The environment block (ADR 0004) provisioned a per-Cell virtualenv and pip-installed into
it. That covers Python, but real projects differ in how dependencies are installed (pip vs
uv, npm, cargo, gradle/maven) and where they live (Python's shared mutable site-packages
vs project-local node_modules/target/). Non-Python cases had to smuggle their installs
through setup.run — untyped, unvalidated, and conflating "install dependencies" with
"introduce the task state."
Separately, every command a Cell runs outside the harness (provisioning, setup.run, the
command/tests/pytest graders) was a direct subprocess.run(argv, cwd, env). To run a
Cell inside a container later, each of those call sites would otherwise need its own
docker exec path. We want one seam, exercised now by the local backend, that a container
backend slots into.
Decision¶
1. Provisioner kind. EnvironmentSpec.kind selects the strategy:
pip-venv(default) — unchanged from ADR 0004: a stdlib venv per Cell, pip-installsrequirements/requirement_files/ the repo (install: editable). Every existing Case keeps its exact behavior.uv— the same venv model via theuvCLI (faster resolver/installer); same fields.command— for ecosystems with project-local deps: run the declaredcommandsin the Sandbox (npm ci,cargo fetch, …). No venv; subprocesses inherit the host env, and the installed deps live in the Sandbox (torn down with it). This is the typed replacement for the oldsetup.run-as-installer pattern.
commands also serves as a post-install hook for the venv kinds (e.g. python -m playwright
install), run under the venv env.
2. The Executor seam. executor.py defines Executor.run(argv, *, cwd, env, timeout,
shell) -> ExecResult. LocalExecutor (host subprocess) is the only backend today;
provision_env is written entirely against the interface and takes an optional executor
(defaulting to LocalExecutor). The provisioner contract — the returned env dict threaded
to every later subprocess — is unchanged.
Considered options¶
- Keep using
setup.runfor non-Python deps. Zero new schema, but untyped, unvalidated, not reproducible per-ecosystem, and conflates provisioning with task setup. Rejected. - A provisioner per ecosystem (node, cargo, go, maven, …). A zoo of near-identical
kinds; their deps are all project-local, so one generic
commandkind covers them with honest, explicit commands. Chosen the singlecommandkind instead. - Jump straight to a container backend. Strongest isolation and the real answer to OS
packages, but heavyweight (needs a running daemon —
dockerdisn't even up by default in CI), and we want the provisioning model proven on the cheap local backend first. Deferred, deliberately, behind the Executor seam.
Consequences¶
- Backward compatible: no
kind⇒pip-venv; all existing cases and theenvironmentthreading are untouched.provision_env's signature gains an optionalexecutor. command-kind validation forbids the pip-only fields and requires non-emptycommands, so misuse fails atvalidatetime.- Container backend (now implemented). A Case may declare a
container: {image, setup, python}block; the runner then builds aContainerExecutorinstead ofLocalExecutor. Itdocker runs a long-lived container fromimagewith the cell directory (rw) and the case directory (ro) bind-mounted at their same absolute paths — so host and container paths coincide, needing nocwdtranslation and letting a venv built in the container resolve identically inside and out. Provisioning,setup.run, and thecommand/tests/pytestgraders run through it (docker exec); the provisioner recipes are unchanged.container.setupruns OS-level prep (apt-get install …) once at start — the OS-package story. Only changed env vars are forwarded into the container (not the whole host environment), so host secrets don't leak. A pinnedimage(by digest) + pinned deps gives a reproducible grading environment — the door ADR 0004 left open. - Boundary (lifted in ADR 0014): originally the Harness (the agent under test) ran on the
host against the bind-mounted Sandbox, so it couldn't use a container-built venv (it was
handed the host env in container mode). ADR 0014 lifts this per Case (
container.harness: true): the Harness then runs inside the Cell's sandbox through the Executor — the in-processopenaiharness needs nothing baked in (only its effects route in), a CLI harness needs its agent CLI in the image. The default still keeps the agent on the host, so the slice described here (containerize the reproducible part — deps + grading — leave the variable under test on the host) remains the back-compatible default. - Reproducibility otherwise remains the case author's responsibility (pin versions / use lockfiles / pin the image digest); the Executor seam is what made the container tier a drop-in rather than a rewrite.