Commit graph

2 commits

Author SHA1 Message Date
Max Gorog
4ab5477226 PIPELINE §5 step 1: fix four root-cause defects
Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.

perf collector (rows_perf=0 on 100% of episodes):
  - perf stat -j writes to stderr by default with -p; we read stdout.
    Add --log-fd 1 so JSON reaches stdout where the parser sees it.
  - Event names come back annotated with the privilege scope perf
    actually measured ("cycles:u" under perf_event_paranoid=2). Strip
    the suffix so _build_row's plain-name lookups hit. Without this
    every metric was None even when perf reported real numbers.
  - tests/test_collectors_emit.py covers the regression with a real
    busy-loop fixture; emit-test discipline per §4.4.

guest-agent collector (rows_guest=0 on 100% of episodes):
  - Alpine cloud image doesn't ship python3, so the in-guest agent's
    `#!/usr/bin/env python3` shebang silently fails. Add packages:
    [python3] to cidata user-data so cloud-init installs it before
    the OpenRC service starts.
  - Guest agent now exits nonzero (was: silent stdout fallback) when
    /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
    reports the failure to /var/log/cis490-agent.log instead of the
    bytes vanishing into the void. Refs §1.
  - Host-side collector emits guest_agent_connected /
    guest_agent_first_byte / guest_agent_silent_window into the
    orchestrator's events.jsonl. Future episodes show the in-guest
    failure mode per-episode instead of inferring from rows_guest=0.

k-gamingcom missing qmp/netflow/pcap (also affected elliott on
  Tier-3 episodes — was misclassified as host divergence):
  - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
    qmp_socket / guest_agent_socket / bridge_iface — even though
    launch_target.sh creates the underlying chardevs and BRIDGE
    supplies the iface. tools/run_real_vm_demo.py wires them
    correctly; Tier-3 had a copy-paste gap.
  - tests/test_collectors_emit.py adds a source-grep regression so
    the wiring stays honest.

samba_usermap_script never lands session (0/67 in §3 probe):
  - Bind handler default WfsDelay (~5s) gives up before bind_perl on
    Metasploitable2 has finished forking + binding LPORT under
    SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
    exploits/driver.py so framework + driver agree on the wait
    budget. Add ConnectTimeout=15 so the handler's bind connect has
    retry budget instead of one-shot.

orchestrator/fleet.py: usable_modules + BRIDGE handling were both
  unconditional, so:
  - With BRIDGE set, requires_bridge modules were still being
    dropped — picker only ever returned samba_usermap_script across
    every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
    failure on HEAD).
  - env.pop("BRIDGE") fired even when BRIDGE was the operator's
    explicit setup, breaking modules that need bridge mode (vsftpd
    backdoor on hardcoded port 6200, distccd, etc.).
  Both made conditional on bridge_set so the picker walks the full
  catalog under bridge mode and SLIRP-only modules still get a
  clean SLIRP env when BRIDGE is unset.

receiver/app.py: half-pregnant v2 schema state in HEAD — calling
  store.ingest_stream(episode_type=..., benign_profile=...) with
  kwargs the matching store.py change was in the WIP stash. Removed
  v2 awareness from app.py so v1 episodes (what the producer ships
  today) get accepted again. SCHEMA_VERSION default reset to 1 to
  match.

229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:05:25 -05:00
max
1b6c7b2f4a Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts
This is the chunk that makes "real data" actually flow on multiple
hosts in parallel. End-to-end pipe was up at 613c6fa / 2579683; now
the lab-host side has the diversity + concurrency it needs.

Collectors landed:
  collectors/qmp.py          — source 2 (oracle). Tiny synchronous QMP
                               client + row builder + run loop. Tolerates
                               older qemu without query-stats.
  collectors/guest_agent.py  — source 5 (deployable). Reads the
                               virtio-serial host-side socket, parses
                               agent JSON-lines, re-stamps to the host
                               monotonic clock, persists.
  collectors/pcap.py         — source 4 (deployable). tcpdump capture
                               + pure-Python pcap reader + 100 ms
                               netflow.jsonl bucketizer. Decodes
                               Ethernet/IPv4/TCP/UDP enough for the
                               schema in docs/data-model.md.

In-guest agent:
  vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads
    /proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs,
    thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent.
  tools/build_cidata.py — embeds the agent + an OpenRC service into
    user-data so first boot of the Alpine cidata image auto-starts it.

Launchers:
  vm/launch_demo.sh / launch_target.sh — second virtio-serial port for
    the agent socket; SLOT env support so multiple VMs run without
    socket / port collisions; PORT_BASE on launch_target so multiple
    target VMs hostfwd different host ports.
  vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24,
    no NAT). Idempotent.

Fleet:
  orchestrator/fleet.py — capacity detector (cores / RAM / load
    headroom) + concurrent-slot runner. Per-slot ENV selects the
    sample. FleetCapacity dataclass round-trips into meta.json so
    "this episode ran with 6 concurrent VMs" is auditable post-hoc.
  tools/run_fleet.py — CLI: --capacity report; --waves N runs N
    waves of (max_concurrent) episodes each, every slot with a
    different sample.
  etc/cis490-orchestrator.service — now drives the fleet runner with
    Restart=always so each invocation runs one wave and respawns,
    giving a continuous stream.

Samples:
  samples/manifest.toml — six profiles spanning the five major
    behaviour shapes. Each entry is real OR mimic (sha256 distinguishes).
  samples/manifest.py — strict TOML loader (rejects dups, unknown
    categories) + deterministic select(host_id, slot, episode_index)
    so different hosts on the network walk the catalog in different
    orders without any coordinator.

EpisodeRunner:
  orchestrator/episode.py — optional qmp_socket + guest_agent_socket
    fields on EpisodeConfig; when set, additional collector threads
    run alongside proc_qemu. EpisodeResult now carries rows_qmp +
    rows_guest counters.

Tier-3 setup automation:
  scripts/install-msfrpcd.sh — installs metasploit-framework where
    the package manager has it, generates a strong password into
    /etc/cis490/msfrpc.env, drops a hardened systemd unit bound to
    127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch
    once MSFRPC_PASSWORD is sourced.
  scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256
    from the operator (Rapid7 download is registration-walled), pulls,
    verifies, converts vmdk → qcow2, lands at vm/images/.

Tests: 82 pass (was 51). New suites:
  tests/test_qmp.py       — fake QMP server, capability handshake,
                            blockstats, async-event interleaving,
                            5-failure backoff
  tests/test_guest_agent.py — fake virtio socket, JSON-lines read +
                              re-stamp, malformed-line tolerance
  tests/test_pcap.py      — synthetic pcap with TCP/UDP/ARP frames,
                            bucketize correctness across windows
  tests/test_fleet.py     — capacity math (8-core idle / low-RAM /
                            high-load / Pi5 / 1-core box), manifest
                            selection determinism + diversity

What's queued for the next commit (already discussed in convo):
  - MSFExploitDriver v2: map sample.profile → distinct in-session
    workload so Tier-3 episodes don't all produce the same yes-loop
    envelope. Critical for ML to learn varied malware shapes.
  - Real-sample fetch from MalwareBazaar by sha256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:02:27 -05:00