CIS490/collectors
Max Gorog dac03d2eff perf: emit per-episode lifecycle events; emit row even with empty agg
Validation on k-gamingcom (commit ac7b85f) showed perf enabled in
production but rows_perf=0 on every episode. Without lifecycle events
the failure mode is indistinguishable from "perf wasn't enabled" — §1
silent-downgrade. The events now surface the actual cause:

  - perf_unavailable     — binary missing OR launch failed (with reason)
  - perf_started         — perf is running (pid, events, interval)
  - perf_first_row       — first row written; counters_populated tells
                           whether any event was actually counted
  - perf_finished        — final tally (intervals_seen,
                           intervals_with_values)
  - perf_no_counters     — perf was alive but every interval came back
                           <not counted> (likely paranoid > 2 or PID
                           ownership mismatch)

`_flush()` now writes a row whenever an interval is observed, even
when every event was <not counted>. The all-None row is honest data
("perf observed this interval and counted nothing"), and the rows
become a count of observed intervals rather than a count of
successful measurements — distinct from rows_proc / rows_qmp which
do count successful measurements. Trainers filter on
`cycles is not None` etc. when they need only populated rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:08:42 -05:00
..
__init__.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
guest_agent.py PIPELINE §5 step 1: fix four root-cause defects 2026-05-03 17:05:25 -05:00
pcap.py Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts 2026-04-30 00:02:27 -05:00
perf_qemu.py perf: emit per-episode lifecycle events; emit row even with empty agg 2026-05-03 18:08:42 -05:00
proc_qemu.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
qmp.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
README.md Scaffold project: docs, repo skeleton, transport + deploy design 2026-04-28 23:21:00 -06:00

collectors/

One module per telemetry source. All collectors:

  • Receive an episode_id, an output directory, and a shared t_mono_origin_ns.
  • Write JSONL into data/episodes/<episode_id>/telemetry-<name>.jsonl.
  • Stamp every row with the same t_mono_ns / t_wall_ns clock pair.
  • Stamp every row with source and available_in_deployment (true/false).
  • Exit cleanly on SIGTERM from the orchestrator.
Module Source Vantage Role
proc_qemu.py host /proc/<qemu_pid>/{stat,io,status,schedstat} outside guest oracle
qmp.py QEMU QMP query-stats, query-blockstats, netdev outside guest oracle
perf_qemu.py perf stat -p <qemu_pid> outside guest oracle
pcap.py tcpdump -i br-malware, bucketed gateway-side feature
guest_agent.py virtio-serial reader, parses agent JSONL inside guest feature

The in-guest agent itself (a small Python+psutil program that runs on the guest and writes to /dev/virtio-ports/cis490.guest.agent) lives under vm/guest-agent/ because it is shipped into the guest at image-build time.

See docs/data-model.md for row schemas.