CIS490

Author	SHA1	Message	Date
max	02b9d0a645	Tier 3 + Tier 4 deploy runbook in AGENTS.md Repo has all the code paths for Tier 3 (real exploit fire via msfrpcd) and Tier 4 (real malware execution via chunked upload), but neither lab host has run a single Tier-3 episode because msfrpcd and the Metasploitable2 image aren't deployed there. 3009 episodes in flight to date are all Tier 2 (mimic workloads in clean Alpine), which is useful pipeline-validation data but cannot answer the actual research question. This commit makes the deploy push-button: - AGENTS.md: new "Tier 3 + Tier 4 deploy" section listing the three prereqs (install-msfrpcd.sh, fetch-metasploitable2.sh, setup_bridge.sh), the foreground verify command (run_tier3_demo.py), and the Tier-4 promotion path (MB API key → fetch_sample.py → manifest edit → orchestrator restart). - samples/manifest.toml: clearer per-entry comment showing the 4-step sha256 → real-binary promotion path. Replaces the earlier "TBD" placeholder which suggested a single edit unlocks Tier 4 when in fact you need to fetch the binary too. The fleet runner already auto-detects msfrpcd (orchestrator/fleet.py _msfrpcd_available()); once the lab-host operator-AI lands the prereqs, episodes flip to Tier 3 with no orchestrator config change. Tier 4 follows automatically the next time the deterministic selector picks a sample whose sha256 file exists in samples/store/.	2026-04-30 22:57:23 -05:00
max	321ea63803	Multi-signal prune classifier: rescue valid episodes /proc misses A laptop-class lab host (elliott-thinkpad) running 14 parallel fleet slots can't deliver host /proc CPU% signal for the bursty profiles — the per-VM share gets buried under contention. But the workloads ARE running: qmp blockstats record 90+ MB written during infected_running for io-walk episodes, netflow shows real packet bursts for scan-and-dial, and the in-guest agent (when alive) shows load_1m deltas the host can't see. The classifier now cross-checks four sources before flagging an episode: - /proc CPU% medians (host-side qemu) - netflow byte totals (bridge_pcap) - qmp blockstats per-phase DELTA (cumulative counters; deltas matter, not raw values) - guest-agent load_1m An episode flags only if every available source agrees no inter-phase signal. Missing sources are "unknown", not "flat". Time-base bug also fixed: phase mapping now uses t_wall_ns (which all sources stamp from CLOCK_REALTIME) rather than t_mono_ns — netflow uses qemu boot-monotonic, /proc uses orchestrator-relative, they don't share a number line. Result on the live receiver: - 1067 active episodes, 100% kept under the new logic - 143 episodes rescued from a previous false-positive archive - Only the 9 genuinely-broken pre-Sample-propagation elliott-lab episodes remain archived (no-sample + no-workload-events) Two new tests (test_flat_proc_rescued_by_netflow, test_flat_everywhere_still_flags) pin the boundary so a future regression surfaces immediately. AGENTS.md gains a "classifier is multi-source" section explaining the cross-check and the t_wall_ns invariant.	2026-04-30 19:10:01 -05:00
Elliott Kolden	3d4936a227	Merge remote-tracking branch 'origin/main' into Dev_REL1_043026	2026-04-30 16:34:01 -06:00
max	2707709299	Fix workload-silent false-positive on Alpine busybox guests (closes #15 ) On-device agent (k-gamingcom) ran the diagnostic probe sequence and proved the workload IS running on Alpine — yes saturating the vCPU, loadavg=1.05, three yes PIDs visible — but two busybox incompatibilities made every episode look silent: 1. _probe() used `pgrep -c yes`. The -c flag is procps-ng/util-linux, not busybox. busybox pgrep exits 1 with a usage banner; the `\|\| echo 0` fallback then reported yes=0 every time. Switched to `pgrep yes \| wc -l` which both pgrep variants support. 2. _wrap_loop appended `disown` after the nohup-backgrounded script. busybox sh / ash have no disown builtin, so each infected_running phase printed `sh: disown: not found` into run()'s captured output. The script kept running (nohup gives SIGHUP immunity, which is what disown was for), but the spurious error is now gone. Cross-validation in the classifier: - prune_episodes.py: workload-silent now requires the probe AND host-side /proc CPU envelope (flat-cpu) to AGREE. A probe-only zero is treated as the busybox false-positive and dropped. This means the 244 already-on-disk episodes from elliott-thinkpad and k-gamingcom are correctly classified without re-collecting. Test coverage: - test_workload_silent_flag updated to require both signals - test_workload_silent_suppressed_when_host_cpu_real new regression for the busybox false-positive AGENTS.md gains a "Don't trust the in-guest probe alone" section with the busybox-vs-procps gotcha + a list of busybox-incompatible patterns to avoid in any new in-guest diagnostic.	2026-04-30 17:28:48 -05:00
elliott	4e8d2bdb04	etc/lab-host.toml.example: pin Caddy root, not wg-pki client CA (closes #14 ) ca_bundle is what the shipper uses to verify collector.wg's TLS cert. That cert is signed by the Caddy Local Authority, bundled in the repo as etc/caddy-root.crt. Pointing it at wg-ca.pem (the wg-pki CIS490 Lab-Host Client CA, which is the receiver's trust anchor for our client cert) caused CERTIFICATE_VERIFY_FAILED on every ship. Original fix authored by the on-device agent on k-gamingcom in Dev_REL2_043026@786b8da; cherry-picked here onto main. Co-Authored-By: k-gamingcom on-device agent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 17:26:36 -05:00
Elliott Kolden	b42d073669	Merge remote-tracking branch 'origin/main' into Dev_REL1_043026	2026-04-30 15:48:23 -06:00
max	8d2d0d2e99	prune+receiver: preserve index ownership and add a backfill helper (closes #13 ) Root cause of #13 (PUT 500s on first ship, retries return already-present): my earlier prune-tool session ran as root and rewrote the live index via os.replace(), which drops the original ownership/mode. The new file was root:root and the cis490 service user couldn't append to it. Every fresh PUT 500'd on _append_index after the tarball had already landed via os.replace, so retries always saw "already-present" and never recovered the missing index row. Two fixes: - tools/prune_episodes.py: snapshot the index's stat before the rename and restore uid/gid/mode after. Best-effort chown so non-root prune runs (where chown would EPERM) still succeed; non-root callers matched the original owner anyway. - tools/index_backfill.py: new tool. Walks episodes/<host>/*.tar.zst, computes sha256+size, and appends rows for episodes missing from the index. Preserves "backfilled: true" so trainers can distinguish reconstructed rows. Always opens the index in append mode (never replaces), so it cannot reproduce the ownership bug it's recovering from. Regression test: tests/test_prune.py::test_archive_preserves_index_mode. Operator note for the live receiver: ran the chown fix manually (chown cis490:cis490 /var/lib/cis490/index.jsonl) and ran the backfill once to recover 140 elliott-thinkpad rows that 500'd before the chown landed.	2026-04-30 16:36:05 -05:00
max	f6d7d07837	Make mTLS bring-up unmistakable for on-device agents Sysadmin observed lab-host agents still trying to "secure the connection" — minting certs, generating CSRs, or otherwise reinventing a cert-delivery flow that's already automated through bootstrap.wg. Three reinforcements so an agent reading any of the three surfaces (AGENTS.md, install script output, journalctl) gets the same message: - AGENTS.md gains a top-of-file "do not mint your own certs" callout + a dedicated "Securing the connection (mTLS)" section with the one fix (re-run install-lab-host.sh after setting host_id) and an explicit "what NOT to do" list (no openssl, no copy from another host, no verify_tls=false). - install-lab-host.sh's FIRST-INSTALL NEXT STEPS now spells out that the cert auto-fetch is silently skipped while host_id is REPLACE_ME, and that the operator MUST re-run the script after editing host_id. Step 2 is now "RE-RUN THIS SCRIPT" with a DO NOT openssl warning. - The shipper's "waiting on mTLS material" warning now embeds the exact remediation command + a pointer to AGENTS.md, so an agent reading journalctl without ever opening the repo still gets it. Tests: 12/12 in test_shipper still pass; warning string change is not asserted on (only the dataclass error field).	2026-04-30 16:23:44 -05:00
max	c80a36d3ae	AGENTS.md: prescriptive guidance for smaller models on lab hosts Smaller (non-4.7) Claude models act as on-device agents on CIS490 lab hosts and have hit the install gotchas that became issues #10–#12. Their reports describe symptoms well but miss inferred context — so this expands the runbook with explicit "do this, not that" notes: - run tools from /opt/cis490 not a clone (CWD-on-sys.path trap) - shipper "waiting on mTLS material" is expected and self-heals; do not try to fix it manually - table of the three install bugs already closed in main, so a fresh agent can recognize the symptom and pull instead of re-filing - "fix one red row at a time" rather than batching attempts Closes nothing new; this is the followup to #10/#11/#12 promised during their resolution.	2026-04-30 16:19:09 -05:00
Elliott Kolden	7c35bf7d49	Merge commit '86a088c' into Dev_REL1_043026	2026-04-30 15:16:41 -06:00
max	86a088c204	shipper: defer SSL context build until cert/CA paths exist (closes #11 ) First-boot bring-up enables cis490-shipper before the Pi has issued the mTLS leaf, so ssl.create_default_context(cafile=...) raised FileNotFoundError out of __init__ and systemd crash-looped the unit every RestartSec=5. Now the transport pre-flights the configured ca_bundle / client_cert / client_key paths, raises a recoverable _CertNotReadyError, and ping/ship_tarball retry the build on each request — daemon self-heals once the cert lands without a restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:13:59 -05:00
Elliott Kolden	7683b64929	Merge origin/main into Dev_REL1_043026; accept main's service files Cherry-picks all upstream additions (fleet runner, full collector suite, shipper module, exploit driver, samples, scripts/, cis490_doctor, etc.) and resolves the two service-file conflicts by accepting main's production versions over the stubs we wrote on Day 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 15:05:51 -06:00
elliott	95ac56a382	fix: three install-time bugs found during first lab-host bring-up on k-gamingcom 1. pyproject.toml — move pycdlib to main deps (was dev-only; cidata build fails on first install because the venv doesn't include dev extras). 2. scripts/install-lab-host.sh — create vm/images/ dir before symlinking alpine-baseline.qcow2 and cidata.iso into INSTALL_ROOT. Without the mkdir the ln -sf silently fails (\|\| true), leaving the launchers unable to find the images and causing every episode to fail within 15 s. 3. tools/cis490_doctor.py — two fixes: a. Insert repo_root into sys.path at doctor startup so the inline `from exploits.modules import ...` succeeds when running from /opt/cis490 (package = false means nothing is installed into site-packages). b. Pass cwd=/opt/cis490 to the shipper --ping subprocess so python -m shipper resolves the module correctly regardless of the caller's CWD. Tested on k-gamingcom: install script now builds cidata.iso on first run, 7-slot fleet wave completes with rc=0, doctor shows 13 ok / 4 warn / 2 fail (remaining failures are mTLS certs + collector.wg DNS — both need Pi-side action, not code changes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 15:05:00 -06:00
Elliott Kolden	86fdd03de4	Dev_REL1_043026: lab-host bring-up, fixes, and issue report Full bring-up of this host from a clean clone: installed uv/perf/tcpdump, downloaded Alpine 3.21 cloud image, built cidata ISO, took baseline-v1 snapshot. Validated single-episode demo (853 rows, 8 phases) and 2-episode campaign loop (campaign_done.marker written). Cherry-picked campaign runner from Dev_REL1_042926. Fixed .gitignore to cover campaign output files. Issue report at reports/Dev_REL1_043026.md covers ISS-001 through ISS-007, with ISS-005 (missing install-lab-host.sh) remaining open. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 14:59:47 -06:00
Elliott Kolden	842918556b	Add automated campaign runner, shipper, and systemd units Implements the unattended episode loop described in docs/deploy.md but not yet built. run_campaign.py boots a fresh VM per episode, drives the full phase schedule via the existing EpisodeRunner/VMLoadController stack, writes campaign.json atomically after each episode, and signals completion with campaign_done.marker. shipper.py watches data/episodes/ for done.marker files, tar+zstd-compresses each, and PUTs them to the receiver with exponential backoff on failure. Both support SIGTERM gracefully, finishing the current episode/scan before exiting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 14:53:40 -06:00
max	a61fa05980	cis490-prune: retroactively filter low-quality episodes from the dataset Without a prune step, every fix we land before elliott-lab pulls leaves a residue of pre-fix episodes in /var/lib/cis490/episodes/. Trainers either filter at training time (processing the bad data anyway) or — worse — train on it. This tool walks the receiver's index, classifies each episode against five quality signals, and either prints a dry-run summary, archives flagged episodes to /var/lib/cis490/episodes-archive/, or deletes them outright (with the index rewritten atomically). Quality signals (each independent; a bad episode can hit several): no-sample meta.sample is null. Pre-Sample-propagation code ran the v1 yes-loop fallback regardless of fleet selection, so the post-infection family isn't recorded. no-workload-events events.jsonl has zero workload_* rows. Pre-audit- trail code (before VMLoadController emits) — we can't tell whether the workload actually fired. workload-failed events.jsonl contains workload_failed. SerialClient raised mid-phase; labels and telemetry don't match what the orchestrator was supposed to be doing. workload-silent workload_killed event during dormant has pre_kill_probe.yes == "0". The schedule walked but the in-guest workload never started — the elliott-lab fingerprint. flat-cpu /proc CPU% medians spread <5pp across phases. A model can't learn to distinguish phases from this; pure noise to the trainer. CLI: cis490-prune # dry-run summary cis490-prune --reason no-sample # restrict to one signal (repeatable) cis490-prune --host elliott-lab # scope to one lab host cis490-prune --archive # mv flagged → episodes-archive/ cis490-prune --delete # rm flagged + drop index rows cis490-prune --json # machine-readable Index rewrite is atomic: tempfile + os.replace, so a crash mid-write leaves the live index intact. Tests: 143 (was 132). New cases (tests/test_prune.py): - one healthy synthetic episode produces zero reasons - five tests covering each individual reason flag - dry-run leaves disk + index untouched - --archive moves tarballs and rewrites index - --delete removes tarballs and rewrites index - --host filter scopes correctly (no-match → exit 0) - multi-reason episodes report all matching reasons Live state when this commit lands: 9 elliott-lab episodes from the pre-fix code path, all flagged. Operator can clear them with one command before elliott-lab re-ships under main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:41:10 -05:00
max	642f7a94d6	runners: take savevm baseline-v1 after boot so revert_at_* actually works EpisodeConfig.revert_at_start / revert_at_end have been issuing loadvm "baseline-v1" via QMP since the snapshot/revert wiring landed, but no part of the system was running savevm — so loadvm targeted a snapshot that didn't exist and silently emitted snapshot_revert_failed every time. The reverted-baseline mode was, in effect, dead code. Both runners now take a savevm immediately after the guest is up and reachable, before any workload runs: run_real_vm_demo.py — after SerialClient.login() succeeds (Tier 2) run_tier3_demo.py — after _wait_for_tcp on the vulnerable port (Tier 3, before the exploit fires) Both call qmp.QMPClient.savevm("baseline-v1"). Best-effort: if savevm fails (older qemu, non-qcow2 disk, KVM nesting issue), we log a warning and run the episode anyway — just without revert support. The snapshot_name in EpisodeConfig is unified to "baseline-v1" across both runners (Tier 3 was previously stamping "qcow2-snapshot-on" into meta, which didn't match what loadvm would target). Why both runners take savevm individually instead of a unified path: the two runners boot different launchers (launch_demo.sh for the Alpine cidata image, launch_target.sh for the vulnerable target). Each is responsible for its own QMP socket lifecycle. A shared savevm helper module would just be a one-line wrapper around the existing qmp.QMPClient.savevm; not worth the indirection. Existing test coverage: tests/test_qmp.py exercises QMPClient.savevm/loadvm against a fake server (HMP wrapper, error path). The runner-side call is exercised in production but not in unit tests — would need a fake launcher subprocess, which is outside this commit's scope. 132/132 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:37:05 -05:00
max	507eac617b	Solvable Tier-3 holes: callback payloads, busybox workloads, bridge by default Closes the next batch of issues from the post-mortem. The previous "each run uses a different vulnerability" commit shipped 5 modules but 3 of them couldn't actually fire under SLIRP+restrict=on: their reverse-shell payloads needed a callback channel the launcher didn't provide, AND their LHOST options were set to {{ target_ip }} (the target's IP, not the attacker's — copy-paste from RHOSTS). Same time, the workloads.py shell commands used bash-only /dev/tcp redirects that silently no-op'd in the busybox shell sessions Metasploitable2 returns. Net effect: episodes that selected those modules would have produced session_open_timeout + dead workloads. Module configs (the three callback ones): exploits/modules/distccd_command_exec.toml exploits/modules/php_cgi_arg_injection.toml exploits/modules/unreal_ircd_3281_backdoor.toml - Switch payload from cmd/unix/reverse* to cmd/unix/bind_perl so the target listens on a known port; msfrpcd connects to it via the host's hostfwd (no callback path required). - Drop the bogus LHOST = "{{ target_ip }}" — bind shells don't use LHOST. - Add [runtime] table: requires_bridge = true extra_target_ports = [<bind_lport>] Both fields are honored by the loader (ModuleConfig.requires_bridge) and the launcher (TARGET_PORTS gets the extra port hostfwd'd when BRIDGE mode is active). orchestrator/fleet.py When BRIDGE is unset in env, _run_slot filters the module catalog down to modules where requires_bridge=False before calling select_module. Two same-socket-shell modules (vsftpd_234_backdoor + samba_usermap_script) survive — fleet still has variety; just doesn't pick modules whose payloads can't land. With BRIDGE set, the full catalog rotates as before, AND BRIDGE is propagated to the per-slot subprocess env so launch_target.sh enters tap+bridge mode. exploits/workloads.py Replaced bash-only constructs in three profiles: scan-and-dial /dev/tcp/HOST/PORT redirects → nc -z -w 1 bursty-c2 same fix shell-resident exec 3<>/dev/tcp/... → piping into nc -w All three now run cleanly in busybox / dash / Metasploitable2's default shell. The remaining three profiles (cpu-saturate, io-walk, low-and-slow) were already busybox-portable. scripts/install-lab-host.sh - lab-host.env now defaults BRIDGE=br-malware (was commented out). Operator opt-out is to comment the line back in. - New step 6b: provisions br-malware via vm/setup_bridge.sh AND pre-creates a per-slot tap pool (cis490tap0..7 for Tier-2 demo, cis490target0..7 for Tier-3 target) all attached to br-malware and brought up. Launchers reference these by SLOT — no sudo needed at episode time. - On bridge-setup failure, the script auto-comments BRIDGE in the env file with a "auto-disabled: bridge setup failed" note so the fleet falls back to same-socket modules + Tier-2 cleanly. tools/cis490_doctor.py Two new checks for the lab-host role: bridge: br-malware exists / up tier3: msfrpcd listening on 127.0.0.1:55553 tier3: module catalog parses (counts same-socket vs requires_bridge) All three are warn-level — they don't fail an otherwise-healthy Tier-2-only setup; they tell the operator what's missing for full Tier-3 + source 4 coverage. Tests: 132 (was 129). New cases: test_fleet.py +3 - fleet skips requires_bridge modules when BRIDGE unset (asserted across 20 episodes; never picks a callback module) - fleet uses the full catalog when BRIDGE is set - BRIDGE env propagates to per-slot subprocess What's still untested live: the bind_perl payloads against a real Metasploitable2 in the bridge-enabled launcher path. That's a deployment validation, not a code change. The unit tests confirm the dispatch / filter logic; the live test is the next operator action. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:32:52 -05:00
max	a193d17ead	fleet: rotate exploit modules per (host, slot, ep); Tier 3 by default Closes the "every run hits the same vulnerability" gap. Before this commit, the fleet shipped Tier-2 episodes (no exploit at all) with only the post-infection sample varying. Tier-3 had a single canned module — vsftpd_234_backdoor — so even when exploit fire was exercised, the entry vector never changed. Trainer would see one shape of `armed → infecting` and learn nothing about how varied real exploits look on the wire / in /proc. What landed: exploits/modules/ + samba_usermap_script.toml CVE-2007-2447, SMB:139 + distccd_command_exec.toml CVE-2004-2687, distcc:3632 + php_cgi_arg_injection.toml CVE-2012-1823, http:80 + unreal_ircd_3281_backdoor.toml CVE-2010-2075, ircd:6667 (vsftpd_234_backdoor.toml unchanged) All five are canonical Metasploitable2 vectors with stable Metasploit modules. Each TOML carries the RPORT the launcher needs to wire its hostfwd at, plus a payload tuned to a clean shell session (cmd/unix/interact for in-band shells, cmd/unix/reverse* with deterministic LPORTs for reverse shells). exploits/modules.py + select_module(catalog, host_id, slot, episode_index) — same SHA-256-keyed deterministic selection shape SampleManifest uses for samples. Two hosts at the same slot/episode hash to different modules; one host walks the full catalog within ~len(catalog) episodes. + module_target_port() — pulls RPORT off the module config so the fleet can plumb the launcher's hostfwd at the right service. orchestrator/fleet.py - _run_slot now decides Tier 3 vs Tier 2 from msfrpcd reachability + module-catalog populated. Default is Tier 3 when both are true; Tier 2 fallback when not (logged + recorded in SlotResult.tier so trainers can filter no-exploit episodes). - Per-slot module via select_module() — each concurrent slot in a wave gets a different vector AND a different sample. - PORT_BASE per slot (target_port + slot * 1000) so concurrent Tier-3 targets don't collide on the host-side hostfwd port. - _msfrpcd_available() probe gates the dispatch. - Fleet-side log line records (slot, ep, tier, sample, module, run_dir) so the operator can see at a glance what each wave is exercising. - SlotResult grows tier + module_name fields; FleetConfig grows modules + force_tier2 + msfrpcd_{host,port} fields. orchestrator/episode.py + EpisodeConfig.exploit_meta — plain dict the runner stamps into meta.exploit so every Tier-3 episode records {framework, module path, module type, payload, RPORT, RHOSTS template}. Trainers join on meta.exploit.module_name to stratify by entry vector; meta.sample.name to stratify by post-infection family. tools/run_tier3_demo.py + Builds exploit_meta from the loaded ModuleConfig and passes it to EpisodeConfig. Sample is now also passed (was missing). tools/run_fleet.py + --modules-dir (default exploits/modules/) — load module catalog on startup; pass to FleetConfig. + --force-tier2 — escape hatch for dev / smoke tests. + JSON output now includes per-slot {tier, module} so the operator can see at a glance what each slot ran without grepping logs. Tests: 129 (was 119). New cases: test_exploits.py +6 - catalog has at least the five canonical Metasploitable2 vectors - select_module is deterministic per (host, slot, ep) - select_module diversifies across hosts - select_module walks the full catalog over many episodes - module_target_port pulls RPORT for each shipped TOML test_fleet.py +4 - _run_slot dispatches to run_tier3_demo.py when msfrpcd up - falls back to run_real_vm_demo.py when msfrpcd unreachable - falls back when module catalog empty - --force-tier2 overrides msfrpcd availability - PORT_BASE is unique per concurrent slot (no hostfwd collision) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:22:49 -05:00
max	d86502d950	workload audit trail: meta.sample + per-phase events + pre-kill probe The elliott-lab episode showed every phase median'd 20% CPU because the in-guest workload silently never fired — and there was no signal in events.jsonl to detect that from outside, so a trainer would treat the labels as ground truth and learn "all phases look identical". This commit closes the audit gap so the failure is visible in meta: orchestrator/episode.py EpisodeConfig.sample: Sample \| None — the manifest entry that drove this episode's workload selection. Stamped into meta.sample as {name, family, category, profile, kind, sha256} so trainers can join cleanly without re-deriving from events. None means the v1 yes-loop fallback path ran (and the trainer should treat the episode with appropriate skepticism). tools/vm_load_controller.py VMLoadController gains an emit_event callable. Every phase now emits a workload_* event into the runner's events.jsonl: workload_setup login + initial cleanup OK workload_killed clean / dormant. Dormant carries a `pre_kill_probe` dict from inside the guest (`pgrep -c yes`, `pgrep -c sh`, /proc/loadavg) so the trainer can detect the elliott-lab failure mode where the workload never actually ran. workload_armed armed handshake fired workload_infecting dd urandom / payload write fired workload_started infected_running command sent workload_failed any of the above raised inside SerialClient (timeout, EOF, partial login). The runner would have silently swallowed the exception via its on_phase try/except; the audit row makes the failure detectable. Exceptions in shell calls surface as workload_failed events but do NOT propagate, matching the runner's existing on_phase contract. tools/run_real_vm_demo.py Wires the controller's emit_event to the runner's emit_event via a small forward-reference closure (controller is built before runner; runner.emit_event needs to be the sink). Sample also flows into EpisodeConfig.sample so meta.sample matches what the controller actually ran. Tests: 119 (was 106). New cases: tests/test_vm_load_controller.py (11 tests against a FakeSerial) - setup emits workload_setup - infected_running runs the v1 yes-loop AND emits workload_started - dormant probes BEFORE killing and stamps pre_kill_probe - dormant probe records "yes=0" (the elliott-lab fingerprint) - clean / armed / infecting all emit their respective events - serial.run() exception → workload_failed event, no propagation - sample-with-profile dispatches to exploits.workloads command (NOT the v1 yes-loop) - missing emit_event callback is a no-op (back-compat) tests/test_episode.py (2 new) - meta.sample carries name/family/category/profile/kind/sha256 when EpisodeConfig.sample is set - meta.sample stays null in the v1 fallback path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:12:34 -05:00
max	8753340ea3	fleet: fix per-slot run-dir collision so concurrent VMs actually run Root cause of "fleet says max_concurrent=3 but only one episode ships per wave" symptom on elliott-lab: 1. orchestrator/fleet.py::_run_slot set env["RUN_DIR"]=/tmp/cis490-vm-fleet-{slot} per slot. 2. tools/run_real_vm_demo.py defaulted --run-dir to /tmp/cis490-vm (NO slot suffix), then UNCONDITIONALLY overwrote the env's RUN_DIR with that flag's value before exec'ing the launcher. 3. So every slot's launcher saw RUN_DIR=/tmp/cis490-vm. All slots collided on the same socket dir. 4. run_real_vm_demo.py also rmtree(run_dir) on entry — slot 1's rmtree literally deleted slot 0's pidfile + sockets mid-boot. 5. Net effect: one VM survives per wave on a multi-core host that should be running ~cores-1 in parallel. Throughput collapses to 1/N. Fix: tools/run_real_vm_demo.py + tools/run_tier3_demo.py: --run-dir default cascade — 1) explicit CLI flag 2) RUN_DIR env (set by fleet runner) 3) /tmp/cis490-vm-<SLOT> (SLOT from env, default 0) Same change in both runners so Tier-2 + Tier-3 fleet waves parallelize cleanly. orchestrator/fleet.py::_run_slot: Pass --run-dir explicitly to the subprocess so the per-slot path is audit-visible in the fleet log instead of buried in env. Also flip the subprocess interpreter to repo_root/.venv/bin/python when present (was /usr/bin/env python3 — worked by luck because the orchestrator path doesn't import msgpack/httpx, but a Tier-3 fleet wave would have died at import-time on a host without those in system Python). etc/cis490-orchestrator.service: Removed the duplicate [Service] hardening block at the bottom of the file that was silently overriding the AmbientCapabilities grant (NoNewPrivileges=true at the bottom flipped the NoNewPrivileges=false at the top, dropping CAP_NET_RAW + CAP_SYS_ ADMIN + CAP_PERFMON before per-episode subprocesses inherit them). Sources 3 + 4 would have failed silently inside the sandbox. Added /tmp to ReadWritePaths so per-slot RUN_DIRs are writable. 106/106 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 01:55:56 -05:00
max	a93a3ff221	bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9 , refs #3 ) Adds a pull-based cert distribution path so install-lab-host.sh can fetch its own leaf cert without operator intervention. Removes the ssh-from-Pi requirement that blocked elliott-lab. How the chicken-and-egg gets solved: a freshly wg-enrolled lab host already has WG access (gate kept by iptmonads at L4) and trusts the Caddy local CA (bundled in this repo at etc/caddy-root.crt). It makes a single TLS call to https://bootstrap.wg/v1/cert/<host_id> — no mTLS — gets back a tar of {ca.crt, leaf.pem, leaf.key}, extracts to /etc/cis490/certs/, and the shipper unblocks. Trust boundary is "reached :443 over WG"; no operator action needed. bootstrap/ app.py Starlette: GET /v1/cert/{host_id}, GET /v1/health. Validates host_id charset, rate-limits per source IP, logs every mint with the X-Real-IP Caddy injects. __main__.py uvicorn launcher; runs as root because the wg-pki CA private key is root-only. etc/cis490-bootstrap.service systemd unit on 127.0.0.1:8446 with ProtectSystem=strict + narrow ReadWritePaths=/var/lib/wg-pki. ProtectHome=no because systemd's read-only mode hides /home contents (the issuer script the wrapper exec's lives there). scripts/issue-cis490-client-cert-wrapper.sh Adapter the bootstrap service shells out to. Resolves the actual wg-pki issuer script across the three plausible install layouts (/opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki) so a single copy of the unit file works on any operator's box. Forces --out-dir to /var/lib/wg-pki/issued so writes stay inside the service's narrow ReadWritePaths. scripts/install-lab-host.sh After scaffolding lab-host.toml, if /etc/cis490/certs/lab-host.pem is absent, curls bootstrap.wg with --cacert etc/caddy-root.crt (no chicken-and-egg), extracts, chowns/chmods. Skips silently if bootstrap.wg is unreachable so manual hand-carry remains possible. scripts/install-receiver.sh Drops cis490-bootstrap.service alongside cis490-receiver and prints both as "enable --now" candidates. cis490-bootstrap is the thing that makes lab hosts self-provisioning. etc/caddy-root.crt Bundled copy of wg-pki's published Caddy local CA root, so the bootstrap fetch can verify TLS without depending on a wg-pki clone that may or may not be on the lab host yet. Verified live on the Pi: $ curl --cacert etc/caddy-root.crt https://bootstrap.wg/v1/cert/elliott-lab -o /tmp/x.tar HTTP 200 size=10240 $ tar tf /tmp/x.tar ca.crt elliott-lab.key elliott-lab.pem $ openssl verify -CAfile … elliott-lab.pem /tmp/.../elliott-lab.pem: OK $ openssl x509 -subject … -noout subject=CN=elliott-lab Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 01:30:29 -05:00
max	6f8b744c33	cis490-doctor + AGENTS.md operator runbook + louder install script Adds the missing diagnostic + onboarding tools so an agent (AI or human) handed a fresh lab host can get to "shipping data" without re-deriving every step from logs. tools/cis490_doctor.py — one-shot health check that walks the full stack from the bottom up. Each row is green/yellow/red with an exact fix command for the red rows. Checks: - repo: branch, tree-clean, distance from origin/main - install: /opt/cis490, .venv python, /etc/cis490/{lab-host,receiver}.toml, /etc/cis490/lab-host.env - mTLS: /etc/cis490/certs/{wg-ca,lab-host}.{pem,key}, openssl chain verify - systemd: cis490-{shipper,orchestrator,receiver} active state - net: receiver.url DNS, TCP reach, mTLS handshake to collector.wg - vm prereqs: /dev/kvm, qemu-system-x86_64, zstd, alpine-baseline.qcow2, cidata.iso - tier3 prereqs: msfrpcd, metasploitable2.qcow2 (warn-level) - end-to-end: cis490-shipper --ping Modes: --role {lab-host,receiver}, --json (machine-readable), --no-tier3 (skip optional checks). Exits non-zero on any red row. ANSI color (auto-disabled on non-tty / NO_COLOR). AGENTS.md gains a "How a lab host gets to shipping data" canonical flow at the top: cert delivery via wg-pki/deploy-cis490-cert.sh → install-lab-host.sh → cis490-doctor → systemctl enable. Plus an "on-demand episode" recipe + a "smallest E2E test" snippet for agents that need to verify the pipe without waiting on the timer. The strict "cloning the repo by itself does nothing" callout makes the failure mode mu and elliott-lab hit explicit. scripts/install-lab-host.sh prints a 5-step banner on first install that points at cis490_doctor.py + the deploy-cis490-cert.sh flow, plus an always-printed footer warning that "cloning + running launchers manually is NOT enough." Same message the AGENTS.md section reinforces. Refs spectral/CIS490#8 (the "Tier-2 is shipping in the meantime" claim that turned out to be untrue because no cis490-shipper service was running on elliott-lab — exactly the case this diagnostic tool targets). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 01:11:57 -05:00
max	7311802822	orchestrator: emit snapshot_load before _write_meta to keep t_mono ~0 On slower disks (Pi5 SD cards, mu's hardware) the json.dump → write → os.replace path inside _write_meta takes more than 1 ms, so when the snapshot_load event fired afterwards its t_mono_ns drifted past the "<1 ms after origin" assertion in test_driver_events_persist_to_events_jsonl. Fix: emit snapshot_load immediately after setting _t_mono_origin_ns, before any file I/O. Matches the semantic intent (snapshot_load marks episode clock = 0) and removes the disk-speed dependency from the event timeline. Diagnosis + suggested patch from spectral/CIS490#7 (filed by mu). Closes spectral/CIS490#7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:49:50 -05:00
max	637fb064df	README: Tier 4 is shipped, source 3 is shipped — drop the stale 🚧 marks Closing the loop on the previous wave's commits. Tier 4 (real-malware fetch + chunked upload + guest-side sha-verify + exec) and source 3 (perf stat collector) are both implemented and tested as of a88ac83; the README still tagged them as TBD / planned. Fix. - Tier 4 status: 🚧 → ✅ code; ⏳ awaiting operator's MalwareBazaar API key + at least one sha256 entry in manifest.toml. Same shape as the Tier-3 line. - New "Tier 4 — real malware sample" section walks through the fetch → chunked upload → guest-side sha-verify → exec flow with links to the relevant code. - Source 3 (perf stat): "🚧 planned" → "✅ opt-in via enable_perf". - Snapshot/revert (revert_at_start / revert_at_end via QMP loadvm) added to the Orchestrator + drivers list. - Test-count header updated 86 → 106. - Stale issue links to closed #4 / #5 / #6 dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:37:00 -05:00
max	a88ac83db0	Close out the deployment-readiness gaps Wraps the gaps surfaced in the "what is not implemented" audit so the fleet really is shippable end-to-end. Verified live on the Pi: - cis490-shipper --ping → HTTP 200 through Caddy + mTLS via the new wg-pki client CA leaf - real episode dir → tar+zstd → PUT → HTTP 201 stored - re-ship same bytes → 200 (idempotent) - re-ship different bytes under same id → 409 (conflict) Changes: orchestrator/episode.py - EpisodeConfig.revert_at_start / revert_at_end (Tier 0+ snapshot/ revert per docs/architecture.md). When set + qmp_socket present, EpisodeRunner issues loadvm <snapshot_name> and emits snapshot_revert / snapshot_revert_failed events on the same monotonic clock as everything else. collectors/qmp.py - savevm() / loadvm() helpers using human-monitor-command, plus a test against the fake QMP server. exploits/workloads.py - chunked_real_binary_upload() returns a ChunkedUpload plan: 8 KiB base64 chunks (~6 KiB binary each) so msfrpc never sees a buffer- busting payload. Includes a finalize step that sha256-verifies on the guest before exec. - real_binary_workload() now wraps the chunked plan for backwards compat with single-shot callers. exploits/driver.py - Tier-4 dispatch walks the chunked plan in MSFExploitDriver: each chunk is a separate session_shell_write; finalize verifies; exec only runs on sha-ok. New events: real_binary_upload_begin, real_binary_verify, real_binary_aborted. etc/cis490-orchestrator.service - Reads /etc/cis490/lab-host.env (FLEET_HOST_ID + optional BRIDGE). - Grants AmbientCapabilities CAP_NET_RAW (tcpdump for source 4) + CAP_SYS_ADMIN + CAP_PERFMON (perf for source 3) so collectors work under hardening. scripts/install-lab-host.sh - Writes /etc/cis490/lab-host.env on first install with FLEET_HOST_ID defaulting to `hostname -s`. - Best-effort: fetches the Alpine baseline qcow2 (sha512-pinned) and builds cidata.iso with the in-guest agent embedded; symlinks both into /opt/cis490/vm/images/ so launchers find them. scripts/fetch-alpine-baseline.sh - Idempotent fetcher for the Alpine 3.21 cloud-init nocloud qcow2 matching the sha512 in docs/sources.md. tools/plot_envelope.py - Rebuilt to render whatever telemetry the episode dir contains: proc → QMP block ops → perf IPC/miss-rate → bridge pkts/SYNs → guest agent load/mem. Missing sources are silently skipped. tools/index_reader.py - cis490-index CLI: filter receiver's index.jsonl by host / sample / time range, sort, count-by group. Closest thing to a query interface until we stand up Postgres/Timescale. samples/README.md - Rewritten to match the new manifest schema, the kind=real vs mimic split, the per-(host, slot, ep) selection mechanic, and the chunked-upload safety story. Tests: 106 pass (was 102). New cases: - test_qmp.py — savevm + loadvm (HMP wrapper + error path) - test_tier4.py — chunked plan splitting, sha-pinned finalize, end-to-end driver walks all chunks + verify + exec via the fake msfrpc client Closes the "what is not implemented" punch list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:31:55 -05:00
max	bdcd2ecbef	Close out the open issues: bridge pcap wiring, perf collector, Tier-4 Wraps the three remaining 🚧 items from the README so every collector the threat-model promises is actually live, and the Tier-4 path (real-malware fetch + upload + exec) works end-to-end as soon as a sha256 lands in samples/store/. Closes spectral/CIS490#4, #5, #6. == #6 — Bridge pcap wiring == EpisodeConfig grows three optional fields: bridge_iface: str \| None # e.g. "br-malware" bridge_ip: str = "10.200.0.1" pcap_snaplen: int = 256 When bridge_iface is set, EpisodeRunner spawns tcpdump for the duration of the schedule (network.pcap), stops it cleanly on episode end, and runs collectors.pcap.bucketize() to produce netflow.jsonl per the 100-ms schema in docs/data-model.md. EpisodeResult + meta.result gain rows_netflow + pcap_bytes counters. vm/launch_demo.sh + launch_target.sh now switch between SLIRP usermode and tap+bridge based on $BRIDGE — operator pre-creates the tap as a bridge member, no sudo from the launcher. run_real_vm_demo.py picks BRIDGE up from env so the fleet runner can opt entire waves into pcap mode by exporting BRIDGE before invocation. == #5 — Source 3 perf collector == collectors/perf_qemu.py shells out to ``perf stat -p <pid> -I 100 -j`` and parses the per-event JSON stream. Aggregates one row per interval across the canonical event set (cycles/instructions/cache-{refs,misses}/ branches/branch-misses/page-faults/context-switches), computes IPC + cache-miss rate. Tolerates missing events (``<not counted>`` / ``<not supported>``) without dropping the row, and skips cleanly when ``perf`` isn't on PATH or the process can't be attached. EpisodeConfig.enable_perf=True opts into the collector — off by default because perf needs CAP_SYS_ADMIN or perf_event_paranoid <= 1. When enabled, runs as a parallel thread alongside the other collectors; EpisodeResult.rows_perf records the count. == #4 — Tier 4 (real-malware fetch + upload + exec) == tools/fetch_sample.py: pulls a sample by sha256 from MalwareBazaar (API key from env or samples/.bazaar.token), unzips with the standard "infected" password, verifies the resulting binary's sha256, lands at samples/store/<sha256>. Idempotent — already-staged correct binaries return immediately. samples/manifest.py: Sample.binary_path(store_root) resolves to the staged binary path, or None for mimics / not-yet-fetched real samples. exploits/workloads.py: real_binary_workload(bytes, sample) builds a Workload that base64-uploads the binary into the shell session via a heredoc, decodes + chmods + execs it in the background, captures the PID for clean stop on dormant. Per-profile pid/bin paths so concurrent samples in the same guest don't collide. exploits/driver.py: dispatch order is now: 1) sample.kind == "real" + binary staged at sample_store_root → real_binary_workload (Tier 4) 2) profile mimic from workloads.workload_for() (Tier 3 v2) 3) None → driver v1 fallback yes-loop DriverConfig.sample_store_root is the new field; run_tier3_demo.py wires it to repo_root/samples/store. driver_setup event records sample_sha256 so trainers can join Tier-4 episodes against the manifest by hash. samples/store/.gitkeep added (binaries themselves are gitignored). Tests: 102 pass (was 86). New suites: tests/test_perf_qemu.py — parser + builder + perf-missing fallback tests/test_tier4.py — real_binary_workload base64 round-trip, stop-cmd kills pidfile, per-profile path isolation, driver dispatch chooses real vs mimic correctly, fetcher input validation and cached-fast-path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:17:49 -05:00
max	c89dbe29e7	README + AGENTS.md: reflect fleet, driver v2, all 4 collectors README: - Intro now describes the multi-host fleet + cross-host sample diversity as the primary workflow. - Tier 2 section: profile-driven workload table replaces the old "yes / dd" description. - New Tier 3 section: covers driver v2 dispatch + setup automation scripts. - Tier maturity table refreshed (1, 2 ✅; 3 ✅ code / ⏳ image; 4 🚧). - Telemetry-sources table moved into the per-tier story so the oracle-vs-feature split is visible from the top of the doc. - Status section restructured by section (Pipeline, Telemetry, Orchestrator + drivers, Fleet) instead of a flat list. Cross-links to the new Forgejo issues for the remaining gaps: #4 — Tier 4 MalwareBazaar fetcher #5 — source 3 (perf stat) #6 — bridge pcap per-episode wiring - Quick-start sections rewritten: 1) "fleet mode (the primary workflow)" with --capacity + --waves 2) "single episode, no fleet" covering both Tier 2 + Tier 3 3) "multi-host fleet — how cross-host diversity works" explains the deterministic per-(host, slot, ep) selection mechanism - Repo-layout table updated to include shipper/, scripts/, AGENTS.md, and the workloads/fleet additions. - Deploying section: replaces the "TODO scaffolds" wording with the actual sudo install-receiver / install-lab-host / wg-pki bring-up flow that's running on the Pi today. AGENTS.md: adds a "don't put off the hard parts" convention as the first item under Other conventions, with explicit guidance on when "deferred-with-reason" is legitimate (genuine operator artifact missing) and the requirement to file an issue + automate the bring-up so it Just Works once the artifact lands. 86/86 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:11:35 -05:00
max	b80986d99c	Driver v2: sample-profile-driven workloads (Tier-2 + Tier-3) The v1 driver ran ``yes > /dev/null`` for every sample, which produced the same envelope shape regardless of which malware family the orchestrator claimed to be running. That's a poor training signal: the model sees identical /proc + QMP traces tagged "cryptominer" / "ransomware" / "RAT" with no distinguishing features. v2 fixes this. What landed: exploits/workloads.py — six ``Workload`` profiles, each producing a distinct in-session shell command pair (start_cmd / stop_cmd) that backgrounds a profile-shaped loop: cpu-saturate — sustained 1-vCPU saturation (XMRig shape) scan-and-dial — periodic SYN-style probes across 10.200.0.0/24 + dial-home to gateway (Mirai shape) io-walk — fs traversal + 4 KiB urandom writes, periodic re-read (ransomware shape) bursty-c2 — long idle, periodic 3-packet TCP egress burst (Dridex C2 beacon shape) low-and-slow — minimal CPU + periodic awk-driven memory churn (Kovter / fileless shape) shell-resident — single long-lived TCP socket pinned to gateway with periodic 6-byte command ticks (RAT shape) Each profile uses a /tmp/.cis490-workload-<profile>.{pid,sh} pair so the stop_cmd can cleanly kill the loop and its descendants. exploits/driver.py — MSFExploitDriver now accepts an optional ``Sample``. With one supplied, ``infected_running`` dispatches to the matching workload via exploits.workloads.workload_for(); the ``sample_executed`` event records profile + sample name + sample kind so the trainer can join cleanly. Without a sample, the v1 yes-loop path remains unchanged (backwards compat). tools/vm_load_controller.py — the same dispatch on the Tier-2 path (no exploit, real Alpine guest driven over the serial console). A fleet wave now produces six visually distinct envelopes per wave whether the underlying mode is Tier 2 or Tier 3. tools/run_real_vm_demo.py — accepts ``--sample <name>`` (or SAMPLE_NAME env from the fleet runner) + auto-wires QMP + agent sockets into the EpisodeConfig so all three new collectors (sources 2, 4, 5) run alongside source 1 by default. tools/run_tier3_demo.py — same ``--sample`` plumbing for the exploit-driven path. Tests: 86 pass (was 82). New v2 cases: - profile dispatch routes infected_running to the workload's start_cmd (NOT the v1 yes-loop) when a Sample is set - all six profiles produce distinct start_cmds (the property the ML model needs) - unknown profile string falls back to cpu-saturate with a warning - v1 path (no Sample) still uses yes-loop (backwards compat) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:06:15 -05:00
max	1b6c7b2f4a	Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts This is the chunk that makes "real data" actually flow on multiple hosts in parallel. End-to-end pipe was up at `613c6fa` / 2579683; now the lab-host side has the diversity + concurrency it needs. Collectors landed: collectors/qmp.py — source 2 (oracle). Tiny synchronous QMP client + row builder + run loop. Tolerates older qemu without query-stats. collectors/guest_agent.py — source 5 (deployable). Reads the virtio-serial host-side socket, parses agent JSON-lines, re-stamps to the host monotonic clock, persists. collectors/pcap.py — source 4 (deployable). tcpdump capture + pure-Python pcap reader + 100 ms netflow.jsonl bucketizer. Decodes Ethernet/IPv4/TCP/UDP enough for the schema in docs/data-model.md. In-guest agent: vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads /proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs, thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent. tools/build_cidata.py — embeds the agent + an OpenRC service into user-data so first boot of the Alpine cidata image auto-starts it. Launchers: vm/launch_demo.sh / launch_target.sh — second virtio-serial port for the agent socket; SLOT env support so multiple VMs run without socket / port collisions; PORT_BASE on launch_target so multiple target VMs hostfwd different host ports. vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24, no NAT). Idempotent. Fleet: orchestrator/fleet.py — capacity detector (cores / RAM / load headroom) + concurrent-slot runner. Per-slot ENV selects the sample. FleetCapacity dataclass round-trips into meta.json so "this episode ran with 6 concurrent VMs" is auditable post-hoc. tools/run_fleet.py — CLI: --capacity report; --waves N runs N waves of (max_concurrent) episodes each, every slot with a different sample. etc/cis490-orchestrator.service — now drives the fleet runner with Restart=always so each invocation runs one wave and respawns, giving a continuous stream. Samples: samples/manifest.toml — six profiles spanning the five major behaviour shapes. Each entry is real OR mimic (sha256 distinguishes). samples/manifest.py — strict TOML loader (rejects dups, unknown categories) + deterministic select(host_id, slot, episode_index) so different hosts on the network walk the catalog in different orders without any coordinator. EpisodeRunner: orchestrator/episode.py — optional qmp_socket + guest_agent_socket fields on EpisodeConfig; when set, additional collector threads run alongside proc_qemu. EpisodeResult now carries rows_qmp + rows_guest counters. Tier-3 setup automation: scripts/install-msfrpcd.sh — installs metasploit-framework where the package manager has it, generates a strong password into /etc/cis490/msfrpc.env, drops a hardened systemd unit bound to 127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch once MSFRPC_PASSWORD is sourced. scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256 from the operator (Rapid7 download is registration-walled), pulls, verifies, converts vmdk → qcow2, lands at vm/images/. Tests: 82 pass (was 51). New suites: tests/test_qmp.py — fake QMP server, capability handshake, blockstats, async-event interleaving, 5-failure backoff tests/test_guest_agent.py — fake virtio socket, JSON-lines read + re-stamp, malformed-line tolerance tests/test_pcap.py — synthetic pcap with TCP/UDP/ARP frames, bucketize correctness across windows tests/test_fleet.py — capacity math (8-core idle / low-RAM / high-load / Pi5 / 1-core box), manifest selection determinism + diversity What's queued for the next commit (already discussed in convo): - MSFExploitDriver v2: map sample.profile → distinct in-session workload so Tier-3 episodes don't all produce the same yes-loop envelope. Critical for ML to learn varied malware shapes. - Real-sample fetch from MalwareBazaar by sha256. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:02:27 -05:00
max	2579683efb	receiver: default to 127.0.0.1:8444 (avoid wg-enroll-listener on 8443) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:45:23 -05:00
max	7c9f9582ca	Lab-host shipper + receiver /v1/ping + install scripts Implements the deployment loop end-to-end on the CIS490 side: shipper/ config.py ShipperConfig (host_id, paths, receiver endpoint, mTLS) transport.py httpx-based PUT + ping with mTLS + bearer support queue.py scan data/episodes/, tar+zstd via system zstd, ship, retire to data/shipped/. Idempotent across crashes per the state machine in docs/transport.md. __main__.py CLI: --ping (smoke test), --once (one pass), or daemon receiver/app.py: new POST /v1/ping that requires the same auth as PUT /v1/episodes but writes nothing. Used by `cis490-shipper --ping` during lab-host bring-up to verify the WG/Caddy/mTLS path before shipping any real bytes. etc/ cis490-shipper.service systemd unit for the lab-host shipper cis490-orchestrator.service systemd unit for the lab-host queue (kept disabled by default until queue mode lands) lab-host.toml.example config template scripts/ install-lab-host.sh idempotent installer; verifies prereqs, creates cis490 service user, syncs repo to /opt/cis490, builds venv, drops systemd units and config template install-receiver.sh same, for the receiver role on the central WG node (Pi5 in our setup) tests/test_shipper.py 11 end-to-end tests against a real Uvicorn server hosting the receiver app. Exercises ping, tar+ship, idempotent re-ship, 409 conflict, transient (receiver down), tarball round-trip via system zstd. AGENTS.md guidance for AI agents working on this and sibling repos. Headline: when you hit an issue you can't fully fix in scope, file a Forgejo issue rather than leaving a TODO. 51/51 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:41:32 -05:00
max	613c6fa223	Tier 3: msfrpc-driven exploit driver + first module config Adds the Tier-3 exploit driver — an MSFExploitDriver that plugs into EpisodeRunner.on_phase, fires a Metasploit module against a target VM via msfrpcd, watches for the resulting session, and stamps each transition (exploit_fire, session_open, session_landing_probe, sample_executed, session_dormant, session_killed) into the episode's events.jsonl on the orchestrator's monotonic clock. What landed: - exploits/msfrpc.py — minimal msgpack-over-HTTPS client (auth, module.execute, job/session lifecycle) so we don't depend on a third-party MSF wrapper. - exploits/driver.py — phase-to-msfrpc adapter; idempotent fire, session-open polling with timeout, workload start/stop, teardown. - exploits/modules.py + exploits/modules/vsftpd_234_backdoor.toml — TOML module configs with {{ target_ip }} placeholders, replacing the imperative .rc-script approach the README previously hinted at. - vm/launch_target.sh — SLIRP+restrict=on launcher for the intentionally-vulnerable target VM (host can reach guest via hostfwd, guest cannot reach host or internet). - tools/run_tier3_demo.py — end-to-end runner mirroring run_real_vm_demo. - tests/test_exploits.py — 12 new tests against a fake MSFRpcClient, including an integration test that drives a real EpisodeRunner. Plumbing changes: - EpisodeRunner._emit_event → public emit_event, so external drivers share the runner's monotonic clock and events.jsonl. - mkdir for episode_dir moved to __init__ so emit_event is callable before run() (driver_setup fires pre-schedule). Status: driver + tests pass (40/40); end-to-end against a live msfrpcd + Metasploitable2 image is the next bring-up step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:11:52 -05:00
Maximus Gorog	7216ec09bd	Tier 2: real Alpine VM, real workload, real envelope End-to-end now drives a real KVM guest through the full XMRig-shaped phase schedule with the workload running INSIDE the guest. Telemetry is host-side /proc/<qemu_pid>; the load is busybox `yes` (sustained CPU saturation) and `dd if=/dev/urandom` (disk burst on infecting), driven over the serial console at every phase transition. The plotted envelope shows clean idle → armed → infecting (disk spike) → infected_running (100% CPU plateau) → dormant → re-entry → final clean. Components: vm/launch_demo.sh now boots Alpine 3.21 nocloud-cloudinit (Cirros 0.6.x's cirros-init blocks on the EC2 metadata service for ~17 min before falling through to NoCloud — abandoned). Mounts a cidata ISO as a second drive. tools/build_cidata.py pure-Python NoCloud ISO builder (pycdlib). Sets root password and ssh_pwauth via runcmd so we don't depend on a specific cloud-init version's plain_text_passwd handling. tools/vm_serial.py serial-console client (stdlib socket). Idempotent login (detects already-in-shell state), sentinel-bracketed run() that distinguishes shell output from the TTY echo of input by requiring a leading \r\n boundary on the marker. tools/vm_load_controller.py in-guest load controller. set_phase() dispatches the per-phase shell command over the serial connection. tools/run_real_vm_demo.py ties it all together: boot VM, wait for cloud-init runcmd, log in, run the EpisodeRunner with on_phase=controller, shut down VM. Deps: paramiko, pycdlib added. docs/sources.md updated with Alpine cloud image (sha512 pinned), and the new Python deps. README leads with the tier-2 plot now (real VM, real workload). The previous synthetic plot is moved below with explicit "host-side mimic, not a VM" labelling. Tier-2 status flipped to ✅ in the tier table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:38:53 -06:00
Maximus Gorog	32ae161ef2	README: embed demo plots, mark synthetic vs real clearly, add collapsibles The README now leads with a 'What an episode looks like' section that shows both: * docs/images/synthetic-envelope.png — pipeline-validation plot. Real telemetry of a real process whose load is shaped by tools/load_mimic.py (Python). Explicitly labelled NOT REAL MALWARE in the caption — the earlier wording was unclear. * docs/images/real-vm-idle.png — real Cirros 0.6.3 booted under KVM, same orchestrator + /proc collector pointed at the qemu-system pid. Idle baseline; no exploit, no payload yet. A 'What's still missing for the real-malware envelope' table makes the tier path explicit (real VM idle → real workload in-guest → real exploit fire → real sample). Repository nav, deploy steps, design rationale, and threat model are moved into <details>...</details> blocks so first-time visitors see the demo plots and the status list without scrolling past wall-of-text. Stale Pi-as-deployment-target wording in the design-rationale section is fixed alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:11:54 -06:00
Maximus Gorog	cc37fc6c4d	Interactive envelope plot via WebAgg (browser-based) plot_envelope.py grows a --show flag. With it, matplotlib's WebAgg backend spins up a localhost server with a real interactive figure (zoom, pan, hover, axes lock) — equivalent to a matlab plot window without needing tkinter or Qt locally. tools/show_envelope.sh is a NixOS-aware wrapper: it locates libstdc++.so.6 in /nix/store (numpy's prebuilt wheel needs it on LD_LIBRARY_PATH) and then exec's the python script with --show. Default port 8988, override via --port. Bound to 0.0.0.0 so the figure is reachable over WG too. tornado is added to dev deps because WebAgg requires it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:06:22 -06:00
Maximus Gorog	69c09f4404	Phase 2: real-VM episode (Cirros under KVM) + works-cited doc vm/launch_demo.sh boots a Cirros qcow2 under KVM with QMP and a monitor socket exposed; snapshot=on routes guest writes to a temporary overlay so the on-disk image is never mutated (clean factory reset every boot). End-to-end verified: vm/launch_demo.sh → orchestrator with --target-pid <qemu pid> → 201 telemetry rows over 20s against the real qemu-system process. The plotted envelope shows the expected idle-VM shape: periodic ~10% CPU spikes from KVM/timer interrupts, flat 230 MiB RSS, and a single late-boot disk write. Distinct from the synthetic load_mimic envelope, confirming the collector reads real KVM behavior. docs/sources.md is the works-cited doc — every tool, library, sample source, paper, and standard the project leans on, grouped by category. README's nav table now points at it. README's status section also lists what's done vs. in progress so reviewers can see scope at a glance. Note: vm/images/ stays gitignored. The Cirros 0.6.3 image is documented with its sha256 (7d6355852aeb...) in docs/sources.md so any team member can reproduce the bytes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:00:25 -06:00
Maximus Gorog	970698af83	Synthetic envelope demo: phase-driven load mimic + plotter End-to-end pipeline now produces a labeled envelope from a single command. Drives the orchestrator through an 8-phase XMRig-shaped schedule and renders a 3-panel envelope (CPU%, RSS, IO write rate) with phase bands sourced from labels.jsonl. Real telemetry, simulated load — validates the collection + labeling shape before a real VM is involved. Components: - tools/load_mimic.py phase-driven load generator. Reads phase commands on stdin; CPU/IO behavior matches the named phase (clean=idle, armed=light burst, infecting=disk burst+CPU, infected_running= CPU saturation+stratum-shaped writes, dormant=quieter than clean). - tools/run_envelope_demo.py spawns load_mimic, drives EpisodeRunner with a default 85s schedule that includes the classic infected_running → dormant → re-entry pattern. - tools/plot_envelope.py reads telemetry + labels from an episode dir, writes envelope.png with colored phase bands. orchestrator: EpisodeRunner now takes an optional phase_schedule and an on_phase callback. Walks the schedule emitting one label per transition. Backwards-compatible — existing single-phase tests still green. Doc fix (user pushback): README + architecture + threat-model no longer imply the Pi5 is the deployment target. Pi5's actual role here is the WireGuard-side collector for episode tarballs. Deployment target is generic ("constrained Linux device"). The "gateway observer" concept remains a deployment pattern, decoupled from the Pi5's collector role. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:53:20 -06:00
Maximus Gorog	064387b7a0	Add v0 orchestrator + first oracle collector (host /proc) End-to-end: ``python -m orchestrator --target-pid <pid> --duration N`` now writes a complete episode directory matching docs/data-model.md, with phase labels, events, and a 10 Hz host /proc telemetry stream. No VM yet — pid is arbitrary so we can validate the loop against e.g. ``sleep 5`` while the lab side comes up. collectors/proc_qemu.py — parses /proc/<pid>/{stat,io,status} (handles parens in comm), single-shot collect_once(), and a stop-event-driven run_loop() that ticks at a fixed cadence and exits when the pid disappears. Tagged ``available_in_deployment: false`` per the threat-model doc. orchestrator/episode.py — EpisodeRunner: creates data/episodes/<ulid>/, atomic meta.json, events.jsonl + labels.jsonl writers, drives the collector in a thread for duration_s, writes done.marker last so the shipper never sees a half-finished episode. orchestrator/ulid.py — tiny 26-char Crockford-base32 ULID generator. Time-sortable, no third-party dep. orchestrator/__main__.py — CLI entry point. Tests (15 new, 28 total green): - proc_qemu: real-ish stat with parens-in-comm, missing /proc/<pid>/io, missing pid, run_loop cadence, run_loop terminates when pid disappears. - episode: full directory shape against os.getpid(), id override, done.marker written after meta.json finalize. - ulid: length+alphabet, 2000-burst uniqueness, time-sortability. Smoke-tested against ``sleep 10``: 16 rows over 1.5s at 100ms cadence, monotonic clock, RSS stable at ~3.5 MiB as expected for an idle sleep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:40:25 -06:00
Maximus Gorog	83e111961d	Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency Implements docs/transport.md as a small Starlette app. The receiver streams episode tarballs to disk, verifies sha256 against an X-Content-SHA256 header, atomically renames into the store on success, and appends one row to a flat index.jsonl. No DB. Idempotent re-PUTs return 200; conflicting bodies return 409. Optional bearer-token auth (mTLS terminates at Caddy in prod). receiver/ store.py EpisodeStore: sha-verifying streaming ingest, atomic rename, append-only index. No HTTP. app.py make_app(): Starlette routes + bearer guard. config.py ReceiverConfig.load(): TOML parser. __main__.py uvicorn entrypoint, reads --config TOML. tests/test_receiver.py — 13 tests via httpx.ASGITransport. Covers: 201 new, 200 idempotent replay, 409 conflict, 400 sha mismatch + cleanup, 400 missing/ short header, 400 bad id, 400 bad suffix, 413 too large, 401 bearer enforcement, schema-version pass-through. etc/cis490-receiver.service — systemd unit with hardening flags. etc/receiver.toml.example — config template matching docs/deploy.md. End-to-end smoke-tested with curl: 201 → 200 → 409 path verified, file on disk, single index row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:34:04 -06:00
Maximus Gorog	fa1574a0a6	Scaffold project: docs, repo skeleton, transport + deploy design Lays down the design surface for the CIS490 behavioral-malware-detection dataset and model. No code yet — schema and topology are decided first so collection can start without rework. Docs: - README: project goal, navigation - architecture: lab topology, KVM choice, episode state machine, deployment-mirror reasoning - threat-model: train/serve parity rule, oracle-vs-deployable feature split, two-model evaluation strategy - data-model: per-episode JSONL layout, row schemas, phase enum - transport: WG-native shipper/receiver design, idempotent uploads - deploy: one-command install for lab-host and receiver roles - lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/, training/ (each with a short README explaining purpose). Extended .gitignore to exclude qcow2 images, pcaps, sample binaries, secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:21:00 -06:00
elliott	7a0fefc02e	Initial commit	2026-04-27 17:28:48 -06:00

1 2 3 4

192 commits