CIS490/orchestrator
Max Gorog dac03d2eff perf: emit per-episode lifecycle events; emit row even with empty agg
Validation on k-gamingcom (commit ac7b85f) showed perf enabled in
production but rows_perf=0 on every episode. Without lifecycle events
the failure mode is indistinguishable from "perf wasn't enabled" — §1
silent-downgrade. The events now surface the actual cause:

  - perf_unavailable     — binary missing OR launch failed (with reason)
  - perf_started         — perf is running (pid, events, interval)
  - perf_first_row       — first row written; counters_populated tells
                           whether any event was actually counted
  - perf_finished        — final tally (intervals_seen,
                           intervals_with_values)
  - perf_no_counters     — perf was alive but every interval came back
                           <not counted> (likely paranoid > 2 or PID
                           ownership mismatch)

`_flush()` now writes a row whenever an interval is observed, even
when every event was <not counted>. The all-None row is honest data
("perf observed this interval and counted nothing"), and the rows
become a count of observed intervals rather than a count of
successful measurements — distinct from rows_proc / rows_qmp which
do count successful measurements. Trainers filter on
`cycles is not None` etc. when they need only populated rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:08:42 -05:00
..
__init__.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
__main__.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
episode.py perf: emit per-episode lifecycle events; emit row even with empty agg 2026-05-03 18:08:42 -05:00
fleet.py fix: revert speculative fleet picker change — was producing dishonest labels 2026-05-03 17:58:43 -05:00
README.md Scaffold project: docs, repo skeleton, transport + deploy design 2026-04-28 23:21:00 -06:00
ulid.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00

orchestrator/

The state machine that drives a single episode:

snapshot_load → clean → armed → infecting → infected_running → dormant → reverting

Responsibilities:

  • Bring up the host-only bridge and verify isolation before the guest starts.
  • Boot the guest from a named snapshot.
  • Spawn the five telemetry collectors (collectors/) with a shared episode id and shared monotonic clock origin.
  • Drive the Metasploit Framework over RPC to fire the configured exploit module.
  • Upload + execute the configured malware sample once a session is open.
  • Emit phase transitions to labels.jsonl at the moment the action is taken.
  • Revert the snapshot at episode end.
  • Write meta.json with the result summary.

Implementation lives in this directory and is imported as orchestrator.*.