CIS490/orchestrator
Max Gorog 4d29b7236d PIPELINE §5 step 3: target VM build infrastructure + containment posture
§4.2 calls for target VMs we BUILD, not VMs we fetch. §4.13 demands
every target ship the same isolation posture (no upstream egress, no
host-shared FS, unprivileged QEMU, fresh snapshot per episode). This
commit lands the infrastructure for both.

New surface:
  * orchestrator/target_spec.py
      Loads + validates `vm/targets/<name>/spec.toml`. Containment
      fields are not knobs — each has exactly ONE safe value, and a
      spec asserting the unsafe value is rejected at load time. There's
      no `--containment-override`; weakening §4.13 requires amending
      PIPELINE.md and operator sign-off.

  * tools/build_target.py
      Orchestrates build → verify → publish for a single target. Spec
      invalid → exit 78 (sysadmin error). build.sh failure → image not
      published. verify.sh failure → image discarded; that's the §4.2
      acceptance gate. Publishes sha256 + the manifest.toml stanza the
      operator copies in to admit the image (§16 substantive amendment
      with sign-off per §15).

  * vm/targets/<name>/{spec.toml,build.sh,verify.sh}
      Template structure. spec.toml is the contract; build.sh produces
      $OUT_PATH; verify.sh boots the produced image under the §4.13
      containment posture and asserts every promise.

  * vm/targets/shellshock/
      First real working target. CVE-2014-6271 (Apache mod_cgi + bash
      4.2 mis-parsing function-export environment values). Replaces
      the SourceForge Metasploitable2 path that §3 evidence proved
      unverifiable. Bash 4.2 is built from sha256-pinned GNU source
      inside an Alpine 3.21 cloudinit guest; the build script asserts
      the produced bash actually triggers shellshock; the verifier
      re-asserts it under restrict=on with a real CVE-2014-6271 probe.

  * vm/targets/README.md
      How operators add a target. Walks the spec → build → verify →
      manifest amendment loop.

Containment regression tests (tests/test_containment.py) — 20 new
assertions, parameterized over every target with a build/verify trio:

  * verify.sh MUST contain `restrict=on` on its netdev (§4.13)
  * verify.sh MUST contain `snapshot=on` on the boot drive (§4.13)
  * verify.sh + build.sh MUST NOT contain -virtfs / -fsdev / 9pfs
  * verify.sh + build.sh MUST NOT wrap qemu-system in `sudo`
  * Every target must ship the complete spec.toml + build.sh + verify.sh
    trio — no half-built targets (§1 default-to-removal)

Spec validation tests (tests/test_target_spec.py): 13 new tests over
spec parse, name/dir mismatch, missing fields, out-of-range port, and
the §4.13 containment field validators (each unsafe value rejected
with a clear error).

The shellshock target's image is NOT yet published to manifest.toml's
[[targets.images]] — that's the §15 sign-off amendment that lands
after a successful operator-driven build_target.py run on a lab host
with KVM. Building takes ~10 min on x86_64; cannot run on the Pi
under TCG. Operator drives the first build, verifies the sha256, then
amends manifest.toml in a follow-up commit.

261 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 01:31:40 -05:00
..
__init__.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
__main__.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
episode.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
fleet.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
manifest.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
README.md Scaffold project: docs, repo skeleton, transport + deploy design 2026-04-28 23:21:00 -06:00
target_spec.py PIPELINE §5 step 3: target VM build infrastructure + containment posture 2026-05-04 01:31:40 -05:00
ulid.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00

orchestrator/

The state machine that drives a single episode:

snapshot_load → clean → armed → infecting → infected_running → dormant → reverting

Responsibilities:

  • Bring up the host-only bridge and verify isolation before the guest starts.
  • Boot the guest from a named snapshot.
  • Spawn the five telemetry collectors (collectors/) with a shared episode id and shared monotonic clock origin.
  • Drive the Metasploit Framework over RPC to fire the configured exploit module.
  • Upload + execute the configured malware sample once a session is open.
  • Emit phase transitions to labels.jsonl at the moment the action is taken.
  • Revert the snapshot at episode end.
  • Write meta.json with the result summary.

Implementation lives in this directory and is imported as orchestrator.*.