Per-Cell virtualenv for dependency-bearing cases (the broader Sandbox)¶
A Case may declare an environment, which provisions a throwaway virtualenv per
Cell (next to the Sandbox at <cell_dir>/env) and installs its declared dependencies
into it — and, with install: editable, the Sandbox repo itself via pip install -e ..
Every subprocess the Cell spawns (the Harness, the setup commands, and the
command/tests/pytest Graders) runs with that venv on PATH through an explicit env
dict, threaded RunContext.env -> RunResult.env. Cases that declare no environment
return None and run under the host interpreter exactly as before.
This extends the Sandbox (an isolated working directory) with an isolated
interpreter, so a Case can target a real repository that needs third-party packages
(e.g. werkzeug -> markupsafe) or that isn't importable from its root (a src/ layout,
resolved by the editable install). The venv is created before the agent runs, so the agent
sees the same dependencies the Graders later grade against, and is torn down with the
Sandbox unless --keep-sandboxes.
Status¶
accepted
Considered options¶
- Install into the host interpreter (
setup.run: ["pip install ..."]). Zero new schema, but mutates a shared site-packages — two parallel Cells installing different versions of the same dist clobber each other, violating the parallel-safe-Cell invariant (ADR 0002). Rejected. PYTHONPATH/sys.pathinjection for src-layout, no installs. Makes asrc/package importable without a venv, but can't supply third-party dependencies and leaks host packages. Insufficient.- A per-Cell virtualenv with an explicit
env(chosen). Full isolation and reproducibility; the editable install handles both src-layout and dependency resolution in one step. Costs venv-creation +pip installtime per Cell. - OS-level isolation (container per Cell). Strongest isolation, but heavyweight and orthogonal to "which model is best?"; deferred. The vocabulary deliberately calls the Environment a venv, not a container, to leave that door open.
Consequences¶
- Graders run subprocesses with
env=result.env; whenNone(noenvironment) they inherit the host, preserving every existing Case's behavior. - The
pytestGrader runs inside the venv, so a venv Case must makepytestavailable — list it inrequirements(the example cases do) or setsystem_site_packages: true. - Provisioning needs network for
pip install; a failure fails just that Cell (caught per-Cell like any other error). Pin dependency versions inrequirementsfor full reproducibility. os.environis never mutated; the venv is expressed purely as the returnedenvdict, honoring ADR 0002's "explicitcwd+env, never global mutation" rule.