CIS490/tests
Max Gorog d9f913fc97 PIPELINE §5 step 6: event-driven labeller (§4.5)
Phase labels are written ONLY when justifying events arrive. The
schedule clock is now a budget — an upper bound — never a label
source. This is the core honesty fix the §3 evidence demanded:

  Before: every Tier-3 episode wrote `infected_running` from the
          schedule clock regardless of whether session_open ever
          fired. Per §10 every dishonest label is a poisoned
          training example. 67/67 of the §3 probe episodes were
          poisoned this way.

  After:  `infecting` writes ONLY when exploit_fire is observed in
          events.jsonl. `infected_running` writes ONLY when
          session_open is observed. Either timing out or seeing
          session_open_timeout terminates the walker with a `failed`
          label that the §4.6 acceptance gate will reject.

PHASE_JUSTIFYING_EVENTS in orchestrator/episode.py declares which
events justify which phases:

    "clean":            None              # orchestrator-emitted
    "armed":            None              # orchestrator-emitted
    "infecting":        ("exploit_fire",)
    "infected_running": ("session_open",)

TERMINAL_FAILURE_EVENTS = {"session_open_timeout"} short-circuit any
event-driven wait into a `failed` label.

`dormant` is intentionally OFF the canonical schedule. §4.5 calls
for dormant to be event-driven (session_idle / session_active) too,
but the driver doesn't emit those yet. Per §1 default-to-removal we
ship without dormant rather than label it from the clock; when the
driver gains those emits, dormant re-enters the schedule with
proper justification.

EpisodeRunner now owns:
  * `_event_log` — every emit_event appends here
  * `_event_cv`  — condition variable for waiters
  * `_wait_for_event(names, since_t_mono_ns, timeout_s)` — returns
                                  the first matching event in the log
                                  with t_mono >= threshold; threshold
                                  catches events that fired during
                                  the previous on_phase callback.

When an event-driven phase's justifier already arrived (e.g.
exploit_fire emitted by driver._fire() inside on_phase("armed")),
the walker uses the EVENT's t_mono on the label — not the time the
walker noticed. The label means "this is when this thing actually
happened."

manifest.toml: dropped the dormant cycle from the canonical schedule.
Episode is shorter (~30s) but every label is event-justified.

14 new tests in tests/test_event_driven_labeller.py covering: justifier
mapping invariants, _wait_for_event semantics (already-arrived,
future, timeout, since-threshold, first-of-multiple-names), walker
behavior (orchestrator-emitted phases, event-driven phases, missing
event → failed, terminal-failure-event short-circuit, stop event,
event-t_mono on label, phase_transition events with justified_by).

286 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 01:43:16 -05:00
..
__init__.py Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency 2026-04-28 23:34:04 -06:00
test_auto_fetch_samples.py auto_fetch_samples: pick Linux i386 ELF; manifest matches theZoo 2026-05-01 03:28:26 -05:00
test_collectors_emit.py PIPELINE §5 step 5: collector admission emit tests (§4.4) 2026-05-04 01:37:40 -05:00
test_containment.py PIPELINE §5 step 3: target VM build infrastructure + containment posture 2026-05-04 01:31:40 -05:00
test_doctor_shipping.py shipper: systemd watchdog, quarantine cleanup; doctor surfaces ship errors 2026-05-01 12:02:59 -05:00
test_episode.py meta.json: stamp code_version (commit, branch, dirty) per episode 2026-05-01 01:29:01 -05:00
test_event_driven_labeller.py PIPELINE §5 step 6: event-driven labeller (§4.5) 2026-05-04 01:43:16 -05:00
test_exploits.py catalog: remove samba_usermap_script — never landed sessions in prod 2026-05-03 22:48:03 -05:00
test_fleet.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
test_fleet_health.py fleet-health: exit 0 when alerts found (don't mark unit failed) 2026-05-02 13:51:20 -05:00
test_guest_agent.py Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts 2026-04-30 00:02:27 -05:00
test_host_health.py fleet-health: proactive alerts on the Pi + per-host doctor reports 2026-05-02 13:48:31 -05:00
test_manifest.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
test_pcap.py Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts 2026-04-30 00:02:27 -05:00
test_perf_qemu.py Close out the open issues: bridge pcap wiring, perf collector, Tier-4 2026-04-30 00:17:49 -05:00
test_proc_qemu.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
test_prune.py Multi-signal prune classifier: rescue valid episodes /proc misses 2026-04-30 19:10:01 -05:00
test_qmp.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
test_quarantine_unstamped.py fix: lab-host install loop after commit-gate cutover 2026-05-01 11:36:21 -05:00
test_receiver.py Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency 2026-04-28 23:34:04 -06:00
test_shipper.py shipper: systemd watchdog, quarantine cleanup; doctor surfaces ship errors 2026-05-01 12:02:59 -05:00
test_target_spec.py PIPELINE §5 step 3: target VM build infrastructure + containment posture 2026-05-04 01:31:40 -05:00
test_tier3_local_verify.py tools/verify_tier3_local.py: Pi-runnable Tier-3 verifier 2026-05-01 03:41:21 -05:00
test_tier4.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
test_ulid.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
test_verify_catalog.py PIPELINE §5 step 4: catalog admission verifier (§4.3) 2026-05-04 01:35:32 -05:00
test_version_gate.py robustness: gate falls back to local git, queue sweeps stale tarballs 2026-05-01 11:49:38 -05:00
test_vm_load_controller.py workload audit trail: meta.sample + per-phase events + pre-kill probe 2026-04-30 02:12:34 -05:00