Skip to content

Per-Cell virtualenv for dependency-bearing cases (the broader Sandbox)

A Case may declare an environment, which provisions a throwaway virtualenv per Cell (next to the Sandbox at <cell_dir>/env) and installs its declared dependencies into it — and, with install: editable, the Sandbox repo itself via pip install -e .. Every subprocess the Cell spawns (the Harness, the setup commands, and the command/tests/pytest Graders) runs with that venv on PATH through an explicit env dict, threaded RunContext.env -> RunResult.env. Cases that declare no environment return None and run under the host interpreter exactly as before.

This extends the Sandbox (an isolated working directory) with an isolated interpreter, so a Case can target a real repository that needs third-party packages (e.g. werkzeug -> markupsafe) or that isn't importable from its root (a src/ layout, resolved by the editable install). The venv is created before the agent runs, so the agent sees the same dependencies the Graders later grade against, and is torn down with the Sandbox unless --keep-sandboxes.

Status

accepted

Considered options

  • Install into the host interpreter (setup.run: ["pip install ..."]). Zero new schema, but mutates a shared site-packages — two parallel Cells installing different versions of the same dist clobber each other, violating the parallel-safe-Cell invariant (ADR 0002). Rejected.
  • PYTHONPATH / sys.path injection for src-layout, no installs. Makes a src/ package importable without a venv, but can't supply third-party dependencies and leaks host packages. Insufficient.
  • A per-Cell virtualenv with an explicit env (chosen). Full isolation and reproducibility; the editable install handles both src-layout and dependency resolution in one step. Costs venv-creation + pip install time per Cell.
  • OS-level isolation (container per Cell). Strongest isolation, but heavyweight and orthogonal to "which model is best?"; deferred. The vocabulary deliberately calls the Environment a venv, not a container, to leave that door open.

Consequences

  • Graders run subprocesses with env=result.env; when None (no environment) they inherit the host, preserving every existing Case's behavior.
  • The pytest Grader runs inside the venv, so a venv Case must make pytest available — list it in requirements (the example cases do) or set system_site_packages: true.
  • Provisioning needs network for pip install; a failure fails just that Cell (caught per-Cell like any other error). Pin dependency versions in requirements for full reproducibility.
  • os.environ is never mutated; the venv is expressed purely as the returned env dict, honoring ADR 0002's "explicit cwd + env, never global mutation" rule.