CIS490

Author	SHA1	Message	Date
Max Gorog	4ab5477226	PIPELINE §5 step 1: fix four root-cause defects Diagnoses + fixes for the silent-collector / never-lands-session failures that the 200-episode quality probe surfaced (§3 evidence). All four address the producer; no compensating layers added. perf collector (rows_perf=0 on 100% of episodes): - perf stat -j writes to stderr by default with -p; we read stdout. Add --log-fd 1 so JSON reaches stdout where the parser sees it. - Event names come back annotated with the privilege scope perf actually measured ("cycles:u" under perf_event_paranoid=2). Strip the suffix so _build_row's plain-name lookups hit. Without this every metric was None even when perf reported real numbers. - tests/test_collectors_emit.py covers the regression with a real busy-loop fixture; emit-test discipline per §4.4. guest-agent collector (rows_guest=0 on 100% of episodes): - Alpine cloud image doesn't ship python3, so the in-guest agent's `#!/usr/bin/env python3` shebang silently fails. Add packages: [python3] to cidata user-data so cloud-init installs it before the OpenRC service starts. - Guest agent now exits nonzero (was: silent stdout fallback) when /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC reports the failure to /var/log/cis490-agent.log instead of the bytes vanishing into the void. Refs §1. - Host-side collector emits guest_agent_connected / guest_agent_first_byte / guest_agent_silent_window into the orchestrator's events.jsonl. Future episodes show the in-guest failure mode per-episode instead of inferring from rows_guest=0. k-gamingcom missing qmp/netflow/pcap (also affected elliott on Tier-3 episodes — was misclassified as host divergence): - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT qmp_socket / guest_agent_socket / bridge_iface — even though launch_target.sh creates the underlying chardevs and BRIDGE supplies the iface. tools/run_real_vm_demo.py wires them correctly; Tier-3 had a copy-paste gap. - tests/test_collectors_emit.py adds a source-grep regression so the wiring stays honest. samba_usermap_script never lands session (0/67 in §3 probe): - Bind handler default WfsDelay (~5s) gives up before bind_perl on Metasploitable2 has finished forking + binding LPORT under SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in exploits/driver.py so framework + driver agree on the wait budget. Add ConnectTimeout=15 so the handler's bind connect has retry budget instead of one-shot. orchestrator/fleet.py: usable_modules + BRIDGE handling were both unconditional, so: - With BRIDGE set, requires_bridge modules were still being dropped — picker only ever returned samba_usermap_script across every slot/episode (the test_fleet_uses_all_modules_when_bridge_set failure on HEAD). - env.pop("BRIDGE") fired even when BRIDGE was the operator's explicit setup, breaking modules that need bridge mode (vsftpd backdoor on hardcoded port 6200, distccd, etc.). Both made conditional on bridge_set so the picker walks the full catalog under bridge mode and SLIRP-only modules still get a clean SLIRP env when BRIDGE is unset. receiver/app.py: half-pregnant v2 schema state in HEAD — calling store.ingest_stream(episode_type=..., benign_profile=...) with kwargs the matching store.py change was in the WIP stash. Removed v2 awareness from app.py so v1 episodes (what the producer ships today) get accepted again. SCHEMA_VERSION default reset to 1 to match. 229 passed, 0 failed. (HEAD had 15 failures, all linked to the half-pregnant v2 state above.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:05:25 -05:00
Elliott Kolden	667f042707	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01) Root causes and fixes documented in TIER3-BRINGUP.md. Summary: 1. BRIDGE env var leaked into Tier-3 subprocess → target VM used tap instead of SLIRP; fix: env.pop("BRIDGE") in fleet _run_slot. 2. usable_modules filter conditioned on BRIDGE presence → bridge-requiring modules selected on SLIRP runs; fix: always filter requires_bridge. 3. cmd/unix/interact creates no session.list entry → session_open_timeout every episode; fix: switch samba_usermap_script to cmd/unix/bind_perl. 4. Per-slot LPORT hostfwd used wrong guest port (host:5444→guest:4444); fix: extra_host_port:extra_host_port mapping so guest binds the per-slot LPORT directly. 5. vsftpd backdoor port 6200 hardcoded → collision across concurrent slots; fix: requires_bridge=true filters it from SLIRP fleet runs. 6. SLIRP false-positive in _wait_for_tcp → exploit fires before Samba boots (~60 s too early); fix: replace TCP probe with serial console _wait_for_serial_login that waits for actual "login:" prompt. 7. Stale QEMU survives orchestrator restart (start_new_session=True) → holds hostfwd ports, new QEMU silently fails; fix: kill by pgid from old pidfile before rmtree. 8. PORT_BASE default used privileged port 21; fix: default to 2021+slot*100. 9. msfrpcd 6.x returns bytes for all string values even with raw=False; fix: MSFRpcClient._str() recursive decoder applied to all responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:26:19 -06:00
max	507eac617b	Solvable Tier-3 holes: callback payloads, busybox workloads, bridge by default Closes the next batch of issues from the post-mortem. The previous "each run uses a different vulnerability" commit shipped 5 modules but 3 of them couldn't actually fire under SLIRP+restrict=on: their reverse-shell payloads needed a callback channel the launcher didn't provide, AND their LHOST options were set to {{ target_ip }} (the target's IP, not the attacker's — copy-paste from RHOSTS). Same time, the workloads.py shell commands used bash-only /dev/tcp redirects that silently no-op'd in the busybox shell sessions Metasploitable2 returns. Net effect: episodes that selected those modules would have produced session_open_timeout + dead workloads. Module configs (the three callback ones): exploits/modules/distccd_command_exec.toml exploits/modules/php_cgi_arg_injection.toml exploits/modules/unreal_ircd_3281_backdoor.toml - Switch payload from cmd/unix/reverse* to cmd/unix/bind_perl so the target listens on a known port; msfrpcd connects to it via the host's hostfwd (no callback path required). - Drop the bogus LHOST = "{{ target_ip }}" — bind shells don't use LHOST. - Add [runtime] table: requires_bridge = true extra_target_ports = [<bind_lport>] Both fields are honored by the loader (ModuleConfig.requires_bridge) and the launcher (TARGET_PORTS gets the extra port hostfwd'd when BRIDGE mode is active). orchestrator/fleet.py When BRIDGE is unset in env, _run_slot filters the module catalog down to modules where requires_bridge=False before calling select_module. Two same-socket-shell modules (vsftpd_234_backdoor + samba_usermap_script) survive — fleet still has variety; just doesn't pick modules whose payloads can't land. With BRIDGE set, the full catalog rotates as before, AND BRIDGE is propagated to the per-slot subprocess env so launch_target.sh enters tap+bridge mode. exploits/workloads.py Replaced bash-only constructs in three profiles: scan-and-dial /dev/tcp/HOST/PORT redirects → nc -z -w 1 bursty-c2 same fix shell-resident exec 3<>/dev/tcp/... → piping into nc -w All three now run cleanly in busybox / dash / Metasploitable2's default shell. The remaining three profiles (cpu-saturate, io-walk, low-and-slow) were already busybox-portable. scripts/install-lab-host.sh - lab-host.env now defaults BRIDGE=br-malware (was commented out). Operator opt-out is to comment the line back in. - New step 6b: provisions br-malware via vm/setup_bridge.sh AND pre-creates a per-slot tap pool (cis490tap0..7 for Tier-2 demo, cis490target0..7 for Tier-3 target) all attached to br-malware and brought up. Launchers reference these by SLOT — no sudo needed at episode time. - On bridge-setup failure, the script auto-comments BRIDGE in the env file with a "auto-disabled: bridge setup failed" note so the fleet falls back to same-socket modules + Tier-2 cleanly. tools/cis490_doctor.py Two new checks for the lab-host role: bridge: br-malware exists / up tier3: msfrpcd listening on 127.0.0.1:55553 tier3: module catalog parses (counts same-socket vs requires_bridge) All three are warn-level — they don't fail an otherwise-healthy Tier-2-only setup; they tell the operator what's missing for full Tier-3 + source 4 coverage. Tests: 132 (was 129). New cases: test_fleet.py +3 - fleet skips requires_bridge modules when BRIDGE unset (asserted across 20 episodes; never picks a callback module) - fleet uses the full catalog when BRIDGE is set - BRIDGE env propagates to per-slot subprocess What's still untested live: the bind_perl payloads against a real Metasploitable2 in the bridge-enabled launcher path. That's a deployment validation, not a code change. The unit tests confirm the dispatch / filter logic; the live test is the next operator action. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:32:52 -05:00
max	a193d17ead	fleet: rotate exploit modules per (host, slot, ep); Tier 3 by default Closes the "every run hits the same vulnerability" gap. Before this commit, the fleet shipped Tier-2 episodes (no exploit at all) with only the post-infection sample varying. Tier-3 had a single canned module — vsftpd_234_backdoor — so even when exploit fire was exercised, the entry vector never changed. Trainer would see one shape of `armed → infecting` and learn nothing about how varied real exploits look on the wire / in /proc. What landed: exploits/modules/ + samba_usermap_script.toml CVE-2007-2447, SMB:139 + distccd_command_exec.toml CVE-2004-2687, distcc:3632 + php_cgi_arg_injection.toml CVE-2012-1823, http:80 + unreal_ircd_3281_backdoor.toml CVE-2010-2075, ircd:6667 (vsftpd_234_backdoor.toml unchanged) All five are canonical Metasploitable2 vectors with stable Metasploit modules. Each TOML carries the RPORT the launcher needs to wire its hostfwd at, plus a payload tuned to a clean shell session (cmd/unix/interact for in-band shells, cmd/unix/reverse* with deterministic LPORTs for reverse shells). exploits/modules.py + select_module(catalog, host_id, slot, episode_index) — same SHA-256-keyed deterministic selection shape SampleManifest uses for samples. Two hosts at the same slot/episode hash to different modules; one host walks the full catalog within ~len(catalog) episodes. + module_target_port() — pulls RPORT off the module config so the fleet can plumb the launcher's hostfwd at the right service. orchestrator/fleet.py - _run_slot now decides Tier 3 vs Tier 2 from msfrpcd reachability + module-catalog populated. Default is Tier 3 when both are true; Tier 2 fallback when not (logged + recorded in SlotResult.tier so trainers can filter no-exploit episodes). - Per-slot module via select_module() — each concurrent slot in a wave gets a different vector AND a different sample. - PORT_BASE per slot (target_port + slot * 1000) so concurrent Tier-3 targets don't collide on the host-side hostfwd port. - _msfrpcd_available() probe gates the dispatch. - Fleet-side log line records (slot, ep, tier, sample, module, run_dir) so the operator can see at a glance what each wave is exercising. - SlotResult grows tier + module_name fields; FleetConfig grows modules + force_tier2 + msfrpcd_{host,port} fields. orchestrator/episode.py + EpisodeConfig.exploit_meta — plain dict the runner stamps into meta.exploit so every Tier-3 episode records {framework, module path, module type, payload, RPORT, RHOSTS template}. Trainers join on meta.exploit.module_name to stratify by entry vector; meta.sample.name to stratify by post-infection family. tools/run_tier3_demo.py + Builds exploit_meta from the loaded ModuleConfig and passes it to EpisodeConfig. Sample is now also passed (was missing). tools/run_fleet.py + --modules-dir (default exploits/modules/) — load module catalog on startup; pass to FleetConfig. + --force-tier2 — escape hatch for dev / smoke tests. + JSON output now includes per-slot {tier, module} so the operator can see at a glance what each slot ran without grepping logs. Tests: 129 (was 119). New cases: test_exploits.py +6 - catalog has at least the five canonical Metasploitable2 vectors - select_module is deterministic per (host, slot, ep) - select_module diversifies across hosts - select_module walks the full catalog over many episodes - module_target_port pulls RPORT for each shipped TOML test_fleet.py +4 - _run_slot dispatches to run_tier3_demo.py when msfrpcd up - falls back to run_real_vm_demo.py when msfrpcd unreachable - falls back when module catalog empty - --force-tier2 overrides msfrpcd availability - PORT_BASE is unique per concurrent slot (no hostfwd collision) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:22:49 -05:00
max	8753340ea3	fleet: fix per-slot run-dir collision so concurrent VMs actually run Root cause of "fleet says max_concurrent=3 but only one episode ships per wave" symptom on elliott-lab: 1. orchestrator/fleet.py::_run_slot set env["RUN_DIR"]=/tmp/cis490-vm-fleet-{slot} per slot. 2. tools/run_real_vm_demo.py defaulted --run-dir to /tmp/cis490-vm (NO slot suffix), then UNCONDITIONALLY overwrote the env's RUN_DIR with that flag's value before exec'ing the launcher. 3. So every slot's launcher saw RUN_DIR=/tmp/cis490-vm. All slots collided on the same socket dir. 4. run_real_vm_demo.py also rmtree(run_dir) on entry — slot 1's rmtree literally deleted slot 0's pidfile + sockets mid-boot. 5. Net effect: one VM survives per wave on a multi-core host that should be running ~cores-1 in parallel. Throughput collapses to 1/N. Fix: tools/run_real_vm_demo.py + tools/run_tier3_demo.py: --run-dir default cascade — 1) explicit CLI flag 2) RUN_DIR env (set by fleet runner) 3) /tmp/cis490-vm-<SLOT> (SLOT from env, default 0) Same change in both runners so Tier-2 + Tier-3 fleet waves parallelize cleanly. orchestrator/fleet.py::_run_slot: Pass --run-dir explicitly to the subprocess so the per-slot path is audit-visible in the fleet log instead of buried in env. Also flip the subprocess interpreter to repo_root/.venv/bin/python when present (was /usr/bin/env python3 — worked by luck because the orchestrator path doesn't import msgpack/httpx, but a Tier-3 fleet wave would have died at import-time on a host without those in system Python). etc/cis490-orchestrator.service: Removed the duplicate [Service] hardening block at the bottom of the file that was silently overriding the AmbientCapabilities grant (NoNewPrivileges=true at the bottom flipped the NoNewPrivileges=false at the top, dropping CAP_NET_RAW + CAP_SYS_ ADMIN + CAP_PERFMON before per-episode subprocesses inherit them). Sources 3 + 4 would have failed silently inside the sandbox. Added /tmp to ReadWritePaths so per-slot RUN_DIRs are writable. 106/106 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 01:55:56 -05:00
max	1b6c7b2f4a	Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts This is the chunk that makes "real data" actually flow on multiple hosts in parallel. End-to-end pipe was up at `613c6fa` / 2579683; now the lab-host side has the diversity + concurrency it needs. Collectors landed: collectors/qmp.py — source 2 (oracle). Tiny synchronous QMP client + row builder + run loop. Tolerates older qemu without query-stats. collectors/guest_agent.py — source 5 (deployable). Reads the virtio-serial host-side socket, parses agent JSON-lines, re-stamps to the host monotonic clock, persists. collectors/pcap.py — source 4 (deployable). tcpdump capture + pure-Python pcap reader + 100 ms netflow.jsonl bucketizer. Decodes Ethernet/IPv4/TCP/UDP enough for the schema in docs/data-model.md. In-guest agent: vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads /proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs, thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent. tools/build_cidata.py — embeds the agent + an OpenRC service into user-data so first boot of the Alpine cidata image auto-starts it. Launchers: vm/launch_demo.sh / launch_target.sh — second virtio-serial port for the agent socket; SLOT env support so multiple VMs run without socket / port collisions; PORT_BASE on launch_target so multiple target VMs hostfwd different host ports. vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24, no NAT). Idempotent. Fleet: orchestrator/fleet.py — capacity detector (cores / RAM / load headroom) + concurrent-slot runner. Per-slot ENV selects the sample. FleetCapacity dataclass round-trips into meta.json so "this episode ran with 6 concurrent VMs" is auditable post-hoc. tools/run_fleet.py — CLI: --capacity report; --waves N runs N waves of (max_concurrent) episodes each, every slot with a different sample. etc/cis490-orchestrator.service — now drives the fleet runner with Restart=always so each invocation runs one wave and respawns, giving a continuous stream. Samples: samples/manifest.toml — six profiles spanning the five major behaviour shapes. Each entry is real OR mimic (sha256 distinguishes). samples/manifest.py — strict TOML loader (rejects dups, unknown categories) + deterministic select(host_id, slot, episode_index) so different hosts on the network walk the catalog in different orders without any coordinator. EpisodeRunner: orchestrator/episode.py — optional qmp_socket + guest_agent_socket fields on EpisodeConfig; when set, additional collector threads run alongside proc_qemu. EpisodeResult now carries rows_qmp + rows_guest counters. Tier-3 setup automation: scripts/install-msfrpcd.sh — installs metasploit-framework where the package manager has it, generates a strong password into /etc/cis490/msfrpc.env, drops a hardened systemd unit bound to 127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch once MSFRPC_PASSWORD is sourced. scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256 from the operator (Rapid7 download is registration-walled), pulls, verifies, converts vmdk → qcow2, lands at vm/images/. Tests: 82 pass (was 51). New suites: tests/test_qmp.py — fake QMP server, capability handshake, blockstats, async-event interleaving, 5-failure backoff tests/test_guest_agent.py — fake virtio socket, JSON-lines read + re-stamp, malformed-line tolerance tests/test_pcap.py — synthetic pcap with TCP/UDP/ARP frames, bucketize correctness across windows tests/test_fleet.py — capacity math (8-core idle / low-RAM / high-load / Pi5 / 1-core box), manifest selection determinism + diversity What's queued for the next commit (already discussed in convo): - MSFExploitDriver v2: map sample.profile → distinct in-session workload so Tier-3 episodes don't all produce the same yes-loop envelope. Critical for ML to learn varied malware shapes. - Real-sample fetch from MalwareBazaar by sha256. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:02:27 -05:00

6 commits