History

max d86502d950 workload audit trail: meta.sample + per-phase events + pre-kill probe The elliott-lab episode showed every phase median'd 20% CPU because the in-guest workload silently never fired — and there was no signal in events.jsonl to detect that from outside, so a trainer would treat the labels as ground truth and learn "all phases look identical". This commit closes the audit gap so the failure is visible in meta: orchestrator/episode.py EpisodeConfig.sample: Sample \| None — the manifest entry that drove this episode's workload selection. Stamped into meta.sample as {name, family, category, profile, kind, sha256} so trainers can join cleanly without re-deriving from events. None means the v1 yes-loop fallback path ran (and the trainer should treat the episode with appropriate skepticism). tools/vm_load_controller.py VMLoadController gains an emit_event callable. Every phase now emits a workload_* event into the runner's events.jsonl: workload_setup login + initial cleanup OK workload_killed clean / dormant. Dormant carries a `pre_kill_probe` dict from inside the guest (`pgrep -c yes`, `pgrep -c sh`, /proc/loadavg) so the trainer can detect the elliott-lab failure mode where the workload never actually ran. workload_armed armed handshake fired workload_infecting dd urandom / payload write fired workload_started infected_running command sent workload_failed any of the above raised inside SerialClient (timeout, EOF, partial login). The runner would have silently swallowed the exception via its on_phase try/except; the audit row makes the failure detectable. Exceptions in shell calls surface as workload_failed events but do NOT propagate, matching the runner's existing on_phase contract. tools/run_real_vm_demo.py Wires the controller's emit_event to the runner's emit_event via a small forward-reference closure (controller is built before runner; runner.emit_event needs to be the sink). Sample also flows into EpisodeConfig.sample so meta.sample matches what the controller actually ran. Tests: 119 (was 106). New cases: tests/test_vm_load_controller.py (11 tests against a FakeSerial) - setup emits workload_setup - infected_running runs the v1 yes-loop AND emits workload_started - dormant probes BEFORE killing and stamps pre_kill_probe - dormant probe records "yes=0" (the elliott-lab fingerprint) - clean / armed / infecting all emit their respective events - serial.run() exception → workload_failed event, no propagation - sample-with-profile dispatches to exploits.workloads command (NOT the v1 yes-loop) - missing emit_event callback is a no-op (back-compat) tests/test_episode.py (2 new) - meta.sample carries name/family/category/profile/kind/sha256 when EpisodeConfig.sample is set - meta.sample stays null in the v1 fallback path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-30 02:12:34 -05:00
..
__init__.py	Add v0 orchestrator + first oracle collector (host /proc)	2026-04-28 23:40:25 -06:00
__main__.py	Add v0 orchestrator + first oracle collector (host /proc)	2026-04-28 23:40:25 -06:00
episode.py	workload audit trail: meta.sample + per-phase events + pre-kill probe	2026-04-30 02:12:34 -05:00
fleet.py	fleet: fix per-slot run-dir collision so concurrent VMs actually run	2026-04-30 01:55:56 -05:00
README.md	Scaffold project: docs, repo skeleton, transport + deploy design	2026-04-28 23:21:00 -06:00
ulid.py	Add v0 orchestrator + first oracle collector (host /proc)	2026-04-28 23:40:25 -06:00

README.md

orchestrator/

The state machine that drives a single episode:

snapshot_load → clean → armed → infecting → infected_running → dormant → reverting

Responsibilities:

Bring up the host-only bridge and verify isolation before the guest starts.
Boot the guest from a named snapshot.
Spawn the five telemetry collectors (collectors/) with a shared episode id and shared monotonic clock origin.
Drive the Metasploit Framework over RPC to fire the configured exploit module.
Upload + execute the configured malware sample once a session is open.
Emit phase transitions to labels.jsonl at the moment the action is taken.
Revert the snapshot at episode end.
Write meta.json with the result summary.

Implementation lives in this directory and is imported as orchestrator.*.