CIS490

Author	SHA1	Message	Date
max	2707709299	Fix workload-silent false-positive on Alpine busybox guests (closes #15 ) On-device agent (k-gamingcom) ran the diagnostic probe sequence and proved the workload IS running on Alpine — yes saturating the vCPU, loadavg=1.05, three yes PIDs visible — but two busybox incompatibilities made every episode look silent: 1. _probe() used `pgrep -c yes`. The -c flag is procps-ng/util-linux, not busybox. busybox pgrep exits 1 with a usage banner; the `\|\| echo 0` fallback then reported yes=0 every time. Switched to `pgrep yes \| wc -l` which both pgrep variants support. 2. _wrap_loop appended `disown` after the nohup-backgrounded script. busybox sh / ash have no disown builtin, so each infected_running phase printed `sh: disown: not found` into run()'s captured output. The script kept running (nohup gives SIGHUP immunity, which is what disown was for), but the spurious error is now gone. Cross-validation in the classifier: - prune_episodes.py: workload-silent now requires the probe AND host-side /proc CPU envelope (flat-cpu) to AGREE. A probe-only zero is treated as the busybox false-positive and dropped. This means the 244 already-on-disk episodes from elliott-thinkpad and k-gamingcom are correctly classified without re-collecting. Test coverage: - test_workload_silent_flag updated to require both signals - test_workload_silent_suppressed_when_host_cpu_real new regression for the busybox false-positive AGENTS.md gains a "Don't trust the in-guest probe alone" section with the busybox-vs-procps gotcha + a list of busybox-incompatible patterns to avoid in any new in-guest diagnostic.	2026-04-30 17:28:48 -05:00
max	d86502d950	workload audit trail: meta.sample + per-phase events + pre-kill probe The elliott-lab episode showed every phase median'd 20% CPU because the in-guest workload silently never fired — and there was no signal in events.jsonl to detect that from outside, so a trainer would treat the labels as ground truth and learn "all phases look identical". This commit closes the audit gap so the failure is visible in meta: orchestrator/episode.py EpisodeConfig.sample: Sample \| None — the manifest entry that drove this episode's workload selection. Stamped into meta.sample as {name, family, category, profile, kind, sha256} so trainers can join cleanly without re-deriving from events. None means the v1 yes-loop fallback path ran (and the trainer should treat the episode with appropriate skepticism). tools/vm_load_controller.py VMLoadController gains an emit_event callable. Every phase now emits a workload_* event into the runner's events.jsonl: workload_setup login + initial cleanup OK workload_killed clean / dormant. Dormant carries a `pre_kill_probe` dict from inside the guest (`pgrep -c yes`, `pgrep -c sh`, /proc/loadavg) so the trainer can detect the elliott-lab failure mode where the workload never actually ran. workload_armed armed handshake fired workload_infecting dd urandom / payload write fired workload_started infected_running command sent workload_failed any of the above raised inside SerialClient (timeout, EOF, partial login). The runner would have silently swallowed the exception via its on_phase try/except; the audit row makes the failure detectable. Exceptions in shell calls surface as workload_failed events but do NOT propagate, matching the runner's existing on_phase contract. tools/run_real_vm_demo.py Wires the controller's emit_event to the runner's emit_event via a small forward-reference closure (controller is built before runner; runner.emit_event needs to be the sink). Sample also flows into EpisodeConfig.sample so meta.sample matches what the controller actually ran. Tests: 119 (was 106). New cases: tests/test_vm_load_controller.py (11 tests against a FakeSerial) - setup emits workload_setup - infected_running runs the v1 yes-loop AND emits workload_started - dormant probes BEFORE killing and stamps pre_kill_probe - dormant probe records "yes=0" (the elliott-lab fingerprint) - clean / armed / infecting all emit their respective events - serial.run() exception → workload_failed event, no propagation - sample-with-profile dispatches to exploits.workloads command (NOT the v1 yes-loop) - missing emit_event callback is a no-op (back-compat) tests/test_episode.py (2 new) - meta.sample carries name/family/category/profile/kind/sha256 when EpisodeConfig.sample is set - meta.sample stays null in the v1 fallback path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:12:34 -05:00
max	b80986d99c	Driver v2: sample-profile-driven workloads (Tier-2 + Tier-3) The v1 driver ran ``yes > /dev/null`` for every sample, which produced the same envelope shape regardless of which malware family the orchestrator claimed to be running. That's a poor training signal: the model sees identical /proc + QMP traces tagged "cryptominer" / "ransomware" / "RAT" with no distinguishing features. v2 fixes this. What landed: exploits/workloads.py — six ``Workload`` profiles, each producing a distinct in-session shell command pair (start_cmd / stop_cmd) that backgrounds a profile-shaped loop: cpu-saturate — sustained 1-vCPU saturation (XMRig shape) scan-and-dial — periodic SYN-style probes across 10.200.0.0/24 + dial-home to gateway (Mirai shape) io-walk — fs traversal + 4 KiB urandom writes, periodic re-read (ransomware shape) bursty-c2 — long idle, periodic 3-packet TCP egress burst (Dridex C2 beacon shape) low-and-slow — minimal CPU + periodic awk-driven memory churn (Kovter / fileless shape) shell-resident — single long-lived TCP socket pinned to gateway with periodic 6-byte command ticks (RAT shape) Each profile uses a /tmp/.cis490-workload-<profile>.{pid,sh} pair so the stop_cmd can cleanly kill the loop and its descendants. exploits/driver.py — MSFExploitDriver now accepts an optional ``Sample``. With one supplied, ``infected_running`` dispatches to the matching workload via exploits.workloads.workload_for(); the ``sample_executed`` event records profile + sample name + sample kind so the trainer can join cleanly. Without a sample, the v1 yes-loop path remains unchanged (backwards compat). tools/vm_load_controller.py — the same dispatch on the Tier-2 path (no exploit, real Alpine guest driven over the serial console). A fleet wave now produces six visually distinct envelopes per wave whether the underlying mode is Tier 2 or Tier 3. tools/run_real_vm_demo.py — accepts ``--sample <name>`` (or SAMPLE_NAME env from the fleet runner) + auto-wires QMP + agent sockets into the EpisodeConfig so all three new collectors (sources 2, 4, 5) run alongside source 1 by default. tools/run_tier3_demo.py — same ``--sample`` plumbing for the exploit-driven path. Tests: 86 pass (was 82). New v2 cases: - profile dispatch routes infected_running to the workload's start_cmd (NOT the v1 yes-loop) when a Sample is set - all six profiles produce distinct start_cmds (the property the ML model needs) - unknown profile string falls back to cpu-saturate with a warning - v1 path (no Sample) still uses yes-loop (backwards compat) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:06:15 -05:00
Maximus Gorog	7216ec09bd	Tier 2: real Alpine VM, real workload, real envelope End-to-end now drives a real KVM guest through the full XMRig-shaped phase schedule with the workload running INSIDE the guest. Telemetry is host-side /proc/<qemu_pid>; the load is busybox `yes` (sustained CPU saturation) and `dd if=/dev/urandom` (disk burst on infecting), driven over the serial console at every phase transition. The plotted envelope shows clean idle → armed → infecting (disk spike) → infected_running (100% CPU plateau) → dormant → re-entry → final clean. Components: vm/launch_demo.sh now boots Alpine 3.21 nocloud-cloudinit (Cirros 0.6.x's cirros-init blocks on the EC2 metadata service for ~17 min before falling through to NoCloud — abandoned). Mounts a cidata ISO as a second drive. tools/build_cidata.py pure-Python NoCloud ISO builder (pycdlib). Sets root password and ssh_pwauth via runcmd so we don't depend on a specific cloud-init version's plain_text_passwd handling. tools/vm_serial.py serial-console client (stdlib socket). Idempotent login (detects already-in-shell state), sentinel-bracketed run() that distinguishes shell output from the TTY echo of input by requiring a leading \r\n boundary on the marker. tools/vm_load_controller.py in-guest load controller. set_phase() dispatches the per-phase shell command over the serial connection. tools/run_real_vm_demo.py ties it all together: boot VM, wait for cloud-init runcmd, log in, run the EpisodeRunner with on_phase=controller, shut down VM. Deps: paramiko, pycdlib added. docs/sources.md updated with Alpine cloud image (sha512 pinned), and the new Python deps. README leads with the tier-2 plot now (real VM, real workload). The previous synthetic plot is moved below with explicit "host-side mimic, not a VM" labelling. Tier-2 status flipped to ✅ in the tier table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:38:53 -06:00

4 commits