CIS490

bolyai/CIS490

Fork 0

Commit graph

Author	SHA1	Message	Date
Max Gorog	0d51b9b253	PIPELINE §5 step 5: collector admission emit tests (§4.4) Adds the missing emit-tests so every collector in KNOWN_COLLECTORS has end-to-end coverage: * test_proc_emits_rows_against_self_pid Samples /proc/<own pid> for ~0.6s. Asserts ≥3 rows + populated core fields (cpu_user_jiffies, rss_bytes, vsize_bytes). Works anywhere with /proc. * test_pcap_bucketize_emits_rows_from_synthetic_capture Builds a 2-packet Ethernet+IPv4+TCP pcap in-memory, feeds it to pcap.bucketize, asserts ≥1 row written + total packet count across buckets matches input. Covers BOTH the pcap and netflow collectors (netflow IS the bucketized pcap output). * test_every_known_collector_has_emit_coverage Cross-cutting tripwire: for every name in KNOWN_COLLECTORS, either there's a test_collectors_emit.py test or there's an explicit COLLECTOR_TEST_CARVE_OUTS entry. Adding a collector to KNOWN_COLLECTORS without an emit test fails this. Carve-outs today: qmp (covered by tests/test_qmp.py — needs running QEMU for real-binary emit) and guest_agent (covered by tests/test_guest_agent.py — needs a real VM with the agent baked in). The carve-outs are explicit, not implicit. A drift where someone adds a new collector without a real-binary emit test fails CI before the manifest can include it. 272 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:37:40 -05:00
Max Gorog	4ab5477226	PIPELINE §5 step 1: fix four root-cause defects Diagnoses + fixes for the silent-collector / never-lands-session failures that the 200-episode quality probe surfaced (§3 evidence). All four address the producer; no compensating layers added. perf collector (rows_perf=0 on 100% of episodes): - perf stat -j writes to stderr by default with -p; we read stdout. Add --log-fd 1 so JSON reaches stdout where the parser sees it. - Event names come back annotated with the privilege scope perf actually measured ("cycles:u" under perf_event_paranoid=2). Strip the suffix so _build_row's plain-name lookups hit. Without this every metric was None even when perf reported real numbers. - tests/test_collectors_emit.py covers the regression with a real busy-loop fixture; emit-test discipline per §4.4. guest-agent collector (rows_guest=0 on 100% of episodes): - Alpine cloud image doesn't ship python3, so the in-guest agent's `#!/usr/bin/env python3` shebang silently fails. Add packages: [python3] to cidata user-data so cloud-init installs it before the OpenRC service starts. - Guest agent now exits nonzero (was: silent stdout fallback) when /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC reports the failure to /var/log/cis490-agent.log instead of the bytes vanishing into the void. Refs §1. - Host-side collector emits guest_agent_connected / guest_agent_first_byte / guest_agent_silent_window into the orchestrator's events.jsonl. Future episodes show the in-guest failure mode per-episode instead of inferring from rows_guest=0. k-gamingcom missing qmp/netflow/pcap (also affected elliott on Tier-3 episodes — was misclassified as host divergence): - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT qmp_socket / guest_agent_socket / bridge_iface — even though launch_target.sh creates the underlying chardevs and BRIDGE supplies the iface. tools/run_real_vm_demo.py wires them correctly; Tier-3 had a copy-paste gap. - tests/test_collectors_emit.py adds a source-grep regression so the wiring stays honest. samba_usermap_script never lands session (0/67 in §3 probe): - Bind handler default WfsDelay (~5s) gives up before bind_perl on Metasploitable2 has finished forking + binding LPORT under SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in exploits/driver.py so framework + driver agree on the wait budget. Add ConnectTimeout=15 so the handler's bind connect has retry budget instead of one-shot. orchestrator/fleet.py: usable_modules + BRIDGE handling were both unconditional, so: - With BRIDGE set, requires_bridge modules were still being dropped — picker only ever returned samba_usermap_script across every slot/episode (the test_fleet_uses_all_modules_when_bridge_set failure on HEAD). - env.pop("BRIDGE") fired even when BRIDGE was the operator's explicit setup, breaking modules that need bridge mode (vsftpd backdoor on hardcoded port 6200, distccd, etc.). Both made conditional on bridge_set so the picker walks the full catalog under bridge mode and SLIRP-only modules still get a clean SLIRP env when BRIDGE is unset. receiver/app.py: half-pregnant v2 schema state in HEAD — calling store.ingest_stream(episode_type=..., benign_profile=...) with kwargs the matching store.py change was in the WIP stash. Removed v2 awareness from app.py so v1 episodes (what the producer ships today) get accepted again. SCHEMA_VERSION default reset to 1 to match. 229 passed, 0 failed. (HEAD had 15 failures, all linked to the half-pregnant v2 state above.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:05:25 -05:00

Author

SHA1

Message

Date

Max Gorog

0d51b9b253

PIPELINE §5 step 5: collector admission emit tests (§4.4)

Adds the missing emit-tests so every collector in KNOWN_COLLECTORS
has end-to-end coverage:

  * test_proc_emits_rows_against_self_pid
      Samples /proc/<own pid> for ~0.6s. Asserts ≥3 rows + populated
      core fields (cpu_user_jiffies, rss_bytes, vsize_bytes). Works
      anywhere with /proc.

  * test_pcap_bucketize_emits_rows_from_synthetic_capture
      Builds a 2-packet Ethernet+IPv4+TCP pcap in-memory, feeds it
      to pcap.bucketize, asserts ≥1 row written + total packet count
      across buckets matches input. Covers BOTH the pcap and netflow
      collectors (netflow IS the bucketized pcap output).

  * test_every_known_collector_has_emit_coverage
      Cross-cutting tripwire: for every name in KNOWN_COLLECTORS,
      either there's a test_collectors_emit.py test or there's an
      explicit COLLECTOR_TEST_CARVE_OUTS entry. Adding a collector
      to KNOWN_COLLECTORS without an emit test fails this. Carve-outs
      today: qmp (covered by tests/test_qmp.py — needs running QEMU
      for real-binary emit) and guest_agent (covered by
      tests/test_guest_agent.py — needs a real VM with the agent
      baked in).

The carve-outs are explicit, not implicit. A drift where someone
adds a new collector without a real-binary emit test fails CI before
the manifest can include it.

272 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-04 01:37:40 -05:00

Max Gorog

4ab5477226

PIPELINE §5 step 1: fix four root-cause defects

Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.

perf collector (rows_perf=0 on 100% of episodes):
  - perf stat -j writes to stderr by default with -p; we read stdout.
    Add --log-fd 1 so JSON reaches stdout where the parser sees it.
  - Event names come back annotated with the privilege scope perf
    actually measured ("cycles:u" under perf_event_paranoid=2). Strip
    the suffix so _build_row's plain-name lookups hit. Without this
    every metric was None even when perf reported real numbers.
  - tests/test_collectors_emit.py covers the regression with a real
    busy-loop fixture; emit-test discipline per §4.4.

guest-agent collector (rows_guest=0 on 100% of episodes):
  - Alpine cloud image doesn't ship python3, so the in-guest agent's
    `#!/usr/bin/env python3` shebang silently fails. Add packages:
    [python3] to cidata user-data so cloud-init installs it before
    the OpenRC service starts.
  - Guest agent now exits nonzero (was: silent stdout fallback) when
    /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
    reports the failure to /var/log/cis490-agent.log instead of the
    bytes vanishing into the void. Refs §1.
  - Host-side collector emits guest_agent_connected /
    guest_agent_first_byte / guest_agent_silent_window into the
    orchestrator's events.jsonl. Future episodes show the in-guest
    failure mode per-episode instead of inferring from rows_guest=0.

k-gamingcom missing qmp/netflow/pcap (also affected elliott on
  Tier-3 episodes — was misclassified as host divergence):
  - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
    qmp_socket / guest_agent_socket / bridge_iface — even though
    launch_target.sh creates the underlying chardevs and BRIDGE
    supplies the iface. tools/run_real_vm_demo.py wires them
    correctly; Tier-3 had a copy-paste gap.
  - tests/test_collectors_emit.py adds a source-grep regression so
    the wiring stays honest.

samba_usermap_script never lands session (0/67 in §3 probe):
  - Bind handler default WfsDelay (~5s) gives up before bind_perl on
    Metasploitable2 has finished forking + binding LPORT under
    SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
    exploits/driver.py so framework + driver agree on the wait
    budget. Add ConnectTimeout=15 so the handler's bind connect has
    retry budget instead of one-shot.

orchestrator/fleet.py: usable_modules + BRIDGE handling were both
  unconditional, so:
  - With BRIDGE set, requires_bridge modules were still being
    dropped — picker only ever returned samba_usermap_script across
    every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
    failure on HEAD).
  - env.pop("BRIDGE") fired even when BRIDGE was the operator's
    explicit setup, breaking modules that need bridge mode (vsftpd
    backdoor on hardcoded port 6200, distccd, etc.).
  Both made conditional on bridge_set so the picker walks the full
  catalog under bridge mode and SLIRP-only modules still get a
  clean SLIRP env when BRIDGE is unset.

receiver/app.py: half-pregnant v2 schema state in HEAD — calling
  store.ingest_stream(episode_type=..., benign_profile=...) with
  kwargs the matching store.py change was in the WIP stash. Removed
  v2 awareness from app.py so v1 episodes (what the producer ships
  today) get accepted again. SCHEMA_VERSION default reset to 1 to
  match.

229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 17:05:25 -05:00

2 commits