History

Max Gorog 4ab5477226 PIPELINE §5 step 1: fix four root-cause defects Diagnoses + fixes for the silent-collector / never-lands-session failures that the 200-episode quality probe surfaced (§3 evidence). All four address the producer; no compensating layers added. perf collector (rows_perf=0 on 100% of episodes): - perf stat -j writes to stderr by default with -p; we read stdout. Add --log-fd 1 so JSON reaches stdout where the parser sees it. - Event names come back annotated with the privilege scope perf actually measured ("cycles:u" under perf_event_paranoid=2). Strip the suffix so _build_row's plain-name lookups hit. Without this every metric was None even when perf reported real numbers. - tests/test_collectors_emit.py covers the regression with a real busy-loop fixture; emit-test discipline per §4.4. guest-agent collector (rows_guest=0 on 100% of episodes): - Alpine cloud image doesn't ship python3, so the in-guest agent's `#!/usr/bin/env python3` shebang silently fails. Add packages: [python3] to cidata user-data so cloud-init installs it before the OpenRC service starts. - Guest agent now exits nonzero (was: silent stdout fallback) when /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC reports the failure to /var/log/cis490-agent.log instead of the bytes vanishing into the void. Refs §1. - Host-side collector emits guest_agent_connected / guest_agent_first_byte / guest_agent_silent_window into the orchestrator's events.jsonl. Future episodes show the in-guest failure mode per-episode instead of inferring from rows_guest=0. k-gamingcom missing qmp/netflow/pcap (also affected elliott on Tier-3 episodes — was misclassified as host divergence): - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT qmp_socket / guest_agent_socket / bridge_iface — even though launch_target.sh creates the underlying chardevs and BRIDGE supplies the iface. tools/run_real_vm_demo.py wires them correctly; Tier-3 had a copy-paste gap. - tests/test_collectors_emit.py adds a source-grep regression so the wiring stays honest. samba_usermap_script never lands session (0/67 in §3 probe): - Bind handler default WfsDelay (~5s) gives up before bind_perl on Metasploitable2 has finished forking + binding LPORT under SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in exploits/driver.py so framework + driver agree on the wait budget. Add ConnectTimeout=15 so the handler's bind connect has retry budget instead of one-shot. orchestrator/fleet.py: usable_modules + BRIDGE handling were both unconditional, so: - With BRIDGE set, requires_bridge modules were still being dropped — picker only ever returned samba_usermap_script across every slot/episode (the test_fleet_uses_all_modules_when_bridge_set failure on HEAD). - env.pop("BRIDGE") fired even when BRIDGE was the operator's explicit setup, breaking modules that need bridge mode (vsftpd backdoor on hardcoded port 6200, distccd, etc.). Both made conditional on bridge_set so the picker walks the full catalog under bridge mode and SLIRP-only modules still get a clean SLIRP env when BRIDGE is unset. receiver/app.py: half-pregnant v2 schema state in HEAD — calling store.ingest_stream(episode_type=..., benign_profile=...) with kwargs the matching store.py change was in the WIP stash. Removed v2 awareness from app.py so v1 episodes (what the producer ships today) get accepted again. SCHEMA_VERSION default reset to 1 to match. 229 passed, 0 failed. (HEAD had 15 failures, all linked to the half-pregnant v2 state above.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-03 17:05:25 -05:00
..
modules	PIPELINE §5 step 1: fix four root-cause defects	2026-05-03 17:05:25 -05:00
__init__.py	Tier 3: msfrpc-driven exploit driver + first module config	2026-04-29 23:11:52 -05:00
driver.py	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)	2026-05-02 12:26:19 -06:00
modules.py	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)	2026-05-02 12:26:19 -06:00
msfrpc.py	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)	2026-05-02 12:26:19 -06:00
README.md	Tier 3: msfrpc-driven exploit driver + first module config	2026-04-29 23:11:52 -05:00
workloads.py	Fix workload-silent false-positive on Alpine busybox guests (closes #15 )	2026-04-30 17:28:48 -05:00

README.md

exploits/

The Tier-3 exploit driver — fires a Metasploit module against a vulnerable target VM, watches for the resulting session, and stamps the session-open transition into the episode's events.jsonl so the labeler can mark armed → infecting honestly.

Layout

exploits/
  msfrpc.py           tiny msgpack-over-HTTPS client for msfrpcd
  driver.py           MSFExploitDriver — plugged in as EpisodeRunner.on_phase
  modules.py          ModuleConfig + TOML loader
  modules/
    vsftpd_234_backdoor.toml   first canned module (Metasploitable2)
    ...

Module configs

Each modules/*.toml describes one Metasploit module — its path, the options to set, and the payload to use. The driver reads these files to drive module.execute over msfrpc.

description = "..."
[module]
type = "exploit"                      # exploit | auxiliary | post
path = "unix/ftp/vsftpd_234_backdoor"

[module.options]
RHOSTS = "{{ target_ip }}"            # placeholder substituted at runtime
RPORT = 21

[payload]
path = "cmd/unix/interact"
[payload.options]                     # optional
# LHOST = "{{ target_ip }}"

[session]
type = "shell"

The only placeholder supported today is {{ target_ip }}. Add more in exploits/modules.py::ModuleConfig.render_options when needed.

Running

# 1. Start msfrpcd locally:
msfrpcd -P <password> -U msf -a 127.0.0.1 -p 55553

# 2. Drop a vulnerable target image at vm/images/<name>.qcow2 (e.g.
#    Metasploitable2 — see docs/sources.md for sha256).

# 3. Drive an episode:
MSFRPC_PASSWORD=<password> uv run python tools/run_tier3_demo.py \
    --module vsftpd_234_backdoor \
    --target-port 21 \
    --data-root data

The episode's events.jsonl will contain:

driver_setup        — module + target snapshotted before fire
exploit_fire        — module.execute issued
session_open        — new session id observed in session.list
session_landing_probe — first command response (id) recorded
sample_executed     — workload kicked off inside the session
session_dormant     — workload killed
session_killed      — session.stop at episode end

These pair with the standard phase labels in labels.jsonl so a downstream loader can reconcile "what the orchestrator scheduled" against "what actually happened on the wire".

Adding a module

Drop a TOML at exploits/modules/<name>.toml per the schema above.
Pick a payload that works without a callback channel until the br-malware bridge is in (see vm/launch_target.sh — SLIRP + restrict=on blocks reverse-tcp by design). cmd/unix/interact and other "session on the same socket" payloads are safe.
Drive a quick check: uv run python tools/run_tier3_demo.py --module <name>.
The new module is automatically picked up by tools/run_tier3_demo.py via --module <name>; no driver code changes needed.

We do not author exploits or modify upstream Metasploit code. The driver is a pure adapter from the project's phase machine to msfrpc.