Commit graph

192 commits

Author SHA1 Message Date
max
02b9d0a645 Tier 3 + Tier 4 deploy runbook in AGENTS.md
Repo has all the code paths for Tier 3 (real exploit fire via msfrpcd)
and Tier 4 (real malware execution via chunked upload), but neither
lab host has run a single Tier-3 episode because msfrpcd and the
Metasploitable2 image aren't deployed there. 3009 episodes in flight
to date are all Tier 2 (mimic workloads in clean Alpine), which is
useful pipeline-validation data but cannot answer the actual research
question.

This commit makes the deploy push-button:

- AGENTS.md: new "Tier 3 + Tier 4 deploy" section listing the three
  prereqs (install-msfrpcd.sh, fetch-metasploitable2.sh, setup_bridge.sh),
  the foreground verify command (run_tier3_demo.py), and the Tier-4
  promotion path (MB API key → fetch_sample.py → manifest edit →
  orchestrator restart).

- samples/manifest.toml: clearer per-entry comment showing the
  4-step sha256 → real-binary promotion path. Replaces the earlier
  "TBD" placeholder which suggested a single edit unlocks Tier 4
  when in fact you need to fetch the binary too.

The fleet runner already auto-detects msfrpcd (orchestrator/fleet.py
_msfrpcd_available()); once the lab-host operator-AI lands the
prereqs, episodes flip to Tier 3 with no orchestrator config change.
Tier 4 follows automatically the next time the deterministic
selector picks a sample whose sha256 file exists in samples/store/.
2026-04-30 22:57:23 -05:00
max
321ea63803 Multi-signal prune classifier: rescue valid episodes /proc misses
A laptop-class lab host (elliott-thinkpad) running 14 parallel fleet
slots can't deliver host /proc CPU% signal for the bursty profiles —
the per-VM share gets buried under contention. But the workloads ARE
running: qmp blockstats record 90+ MB written during infected_running
for io-walk episodes, netflow shows real packet bursts for
scan-and-dial, and the in-guest agent (when alive) shows load_1m
deltas the host can't see.

The classifier now cross-checks four sources before flagging an
episode:
  - /proc CPU% medians (host-side qemu)
  - netflow byte totals (bridge_pcap)
  - qmp blockstats per-phase DELTA (cumulative counters; deltas
    matter, not raw values)
  - guest-agent load_1m

An episode flags only if every available source agrees no
inter-phase signal. Missing sources are "unknown", not "flat".

Time-base bug also fixed: phase mapping now uses t_wall_ns (which
all sources stamp from CLOCK_REALTIME) rather than t_mono_ns —
netflow uses qemu boot-monotonic, /proc uses orchestrator-relative,
they don't share a number line.

Result on the live receiver:
  - 1067 active episodes, 100% kept under the new logic
  - 143 episodes rescued from a previous false-positive archive
  - Only the 9 genuinely-broken pre-Sample-propagation elliott-lab
    episodes remain archived (no-sample + no-workload-events)

Two new tests (test_flat_proc_rescued_by_netflow,
test_flat_everywhere_still_flags) pin the boundary so a future
regression surfaces immediately.

AGENTS.md gains a "classifier is multi-source" section explaining
the cross-check and the t_wall_ns invariant.
2026-04-30 19:10:01 -05:00
3d4936a227 Merge remote-tracking branch 'origin/main' into Dev_REL1_043026 2026-04-30 16:34:01 -06:00
max
2707709299 Fix workload-silent false-positive on Alpine busybox guests (closes #15)
On-device agent (k-gamingcom) ran the diagnostic probe sequence and
proved the workload IS running on Alpine — yes saturating the vCPU,
loadavg=1.05, three yes PIDs visible — but two busybox incompatibilities
made every episode look silent:

1. _probe() used `pgrep -c yes`. The -c flag is procps-ng/util-linux,
   not busybox. busybox pgrep exits 1 with a usage banner; the
   `|| echo 0` fallback then reported yes=0 every time. Switched to
   `pgrep yes | wc -l` which both pgrep variants support.

2. _wrap_loop appended `disown` after the nohup-backgrounded script.
   busybox sh / ash have no disown builtin, so each infected_running
   phase printed `sh: disown: not found` into run()'s captured output.
   The script kept running (nohup gives SIGHUP immunity, which is
   what disown was for), but the spurious error is now gone.

Cross-validation in the classifier:
- prune_episodes.py: workload-silent now requires the probe AND
  host-side /proc CPU envelope (flat-cpu) to AGREE. A probe-only zero
  is treated as the busybox false-positive and dropped. This means
  the 244 already-on-disk episodes from elliott-thinkpad and
  k-gamingcom are correctly classified without re-collecting.

Test coverage:
- test_workload_silent_flag updated to require both signals
- test_workload_silent_suppressed_when_host_cpu_real new regression
  for the busybox false-positive

AGENTS.md gains a "Don't trust the in-guest probe alone" section with
the busybox-vs-procps gotcha + a list of busybox-incompatible patterns
to avoid in any new in-guest diagnostic.
2026-04-30 17:28:48 -05:00
4e8d2bdb04 etc/lab-host.toml.example: pin Caddy root, not wg-pki client CA (closes #14)
ca_bundle is what the shipper uses to verify collector.wg's TLS cert.
That cert is signed by the Caddy Local Authority, bundled in the repo
as etc/caddy-root.crt. Pointing it at wg-ca.pem (the wg-pki CIS490
Lab-Host Client CA, which is the *receiver's* trust anchor for our
client cert) caused CERTIFICATE_VERIFY_FAILED on every ship.

Original fix authored by the on-device agent on k-gamingcom in
Dev_REL2_043026@786b8da; cherry-picked here onto main.

Co-Authored-By: k-gamingcom on-device agent
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:26:36 -05:00
b42d073669 Merge remote-tracking branch 'origin/main' into Dev_REL1_043026 2026-04-30 15:48:23 -06:00
max
8d2d0d2e99 prune+receiver: preserve index ownership and add a backfill helper (closes #13)
Root cause of #13 (PUT 500s on first ship, retries return already-present):
my earlier prune-tool session ran as root and rewrote the live index via
os.replace(), which drops the original ownership/mode. The new file was
root:root and the cis490 service user couldn't append to it. Every fresh
PUT 500'd on _append_index after the tarball had already landed via
os.replace, so retries always saw "already-present" and never recovered
the missing index row.

Two fixes:

- tools/prune_episodes.py: snapshot the index's stat before the rename
  and restore uid/gid/mode after. Best-effort chown so non-root prune
  runs (where chown would EPERM) still succeed; non-root callers
  matched the original owner anyway.

- tools/index_backfill.py: new tool. Walks episodes/<host>/*.tar.zst,
  computes sha256+size, and appends rows for episodes missing from
  the index. Preserves "backfilled: true" so trainers can distinguish
  reconstructed rows. Always opens the index in append mode (never
  replaces), so it cannot reproduce the ownership bug it's recovering
  from.

Regression test: tests/test_prune.py::test_archive_preserves_index_mode.

Operator note for the live receiver: ran the chown fix manually
(chown cis490:cis490 /var/lib/cis490/index.jsonl) and ran the
backfill once to recover 140 elliott-thinkpad rows that 500'd before
the chown landed.
2026-04-30 16:36:05 -05:00
max
f6d7d07837 Make mTLS bring-up unmistakable for on-device agents
Sysadmin observed lab-host agents still trying to "secure the
connection" — minting certs, generating CSRs, or otherwise reinventing
a cert-delivery flow that's already automated through bootstrap.wg.
Three reinforcements so an agent reading any of the three surfaces
(AGENTS.md, install script output, journalctl) gets the same message:

- AGENTS.md gains a top-of-file "do not mint your own certs" callout
  + a dedicated "Securing the connection (mTLS)" section with the
  one fix (re-run install-lab-host.sh after setting host_id) and an
  explicit "what NOT to do" list (no openssl, no copy from another
  host, no verify_tls=false).

- install-lab-host.sh's FIRST-INSTALL NEXT STEPS now spells out that
  the cert auto-fetch is silently skipped while host_id is REPLACE_ME,
  and that the operator MUST re-run the script after editing host_id.
  Step 2 is now "RE-RUN THIS SCRIPT" with a DO NOT openssl warning.

- The shipper's "waiting on mTLS material" warning now embeds the
  exact remediation command + a pointer to AGENTS.md, so an agent
  reading journalctl without ever opening the repo still gets it.

Tests: 12/12 in test_shipper still pass; warning string change is
not asserted on (only the dataclass error field).
2026-04-30 16:23:44 -05:00
max
c80a36d3ae AGENTS.md: prescriptive guidance for smaller models on lab hosts
Smaller (non-4.7) Claude models act as on-device agents on CIS490 lab
hosts and have hit the install gotchas that became issues #10–#12.
Their reports describe symptoms well but miss inferred context — so
this expands the runbook with explicit "do this, not that" notes:

- run tools from /opt/cis490 not a clone (CWD-on-sys.path trap)
- shipper "waiting on mTLS material" is expected and self-heals; do
  not try to fix it manually
- table of the three install bugs already closed in main, so a fresh
  agent can recognize the symptom and pull instead of re-filing
- "fix one red row at a time" rather than batching attempts

Closes nothing new; this is the followup to #10/#11/#12 promised
during their resolution.
2026-04-30 16:19:09 -05:00
7c35bf7d49 Merge commit '86a088c' into Dev_REL1_043026 2026-04-30 15:16:41 -06:00
max
86a088c204 shipper: defer SSL context build until cert/CA paths exist (closes #11)
First-boot bring-up enables cis490-shipper before the Pi has issued the
mTLS leaf, so ssl.create_default_context(cafile=...) raised
FileNotFoundError out of __init__ and systemd crash-looped the unit
every RestartSec=5. Now the transport pre-flights the configured
ca_bundle / client_cert / client_key paths, raises a recoverable
_CertNotReadyError, and ping/ship_tarball retry the build on each
request — daemon self-heals once the cert lands without a restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:13:59 -05:00
7683b64929 Merge origin/main into Dev_REL1_043026; accept main's service files
Cherry-picks all upstream additions (fleet runner, full collector suite,
shipper module, exploit driver, samples, scripts/, cis490_doctor, etc.)
and resolves the two service-file conflicts by accepting main's production
versions over the stubs we wrote on Day 1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 15:05:51 -06:00
95ac56a382 fix: three install-time bugs found during first lab-host bring-up on k-gamingcom
1. pyproject.toml — move pycdlib to main deps (was dev-only; cidata build
   fails on first install because the venv doesn't include dev extras).

2. scripts/install-lab-host.sh — create vm/images/ dir before symlinking
   alpine-baseline.qcow2 and cidata.iso into INSTALL_ROOT. Without the
   mkdir the ln -sf silently fails (|| true), leaving the launchers unable
   to find the images and causing every episode to fail within 15 s.

3. tools/cis490_doctor.py — two fixes:
   a. Insert repo_root into sys.path at doctor startup so the inline
      `from exploits.modules import ...` succeeds when running from /opt/cis490
      (package = false means nothing is installed into site-packages).
   b. Pass cwd=/opt/cis490 to the shipper --ping subprocess so python -m
      shipper resolves the module correctly regardless of the caller's CWD.

Tested on k-gamingcom: install script now builds cidata.iso on first run,
7-slot fleet wave completes with rc=0, doctor shows 13 ok / 4 warn / 2 fail
(remaining failures are mTLS certs + collector.wg DNS — both need Pi-side
action, not code changes).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 15:05:00 -06:00
86fdd03de4 Dev_REL1_043026: lab-host bring-up, fixes, and issue report
Full bring-up of this host from a clean clone: installed uv/perf/tcpdump,
downloaded Alpine 3.21 cloud image, built cidata ISO, took baseline-v1
snapshot. Validated single-episode demo (853 rows, 8 phases) and 2-episode
campaign loop (campaign_done.marker written). Cherry-picked campaign runner
from Dev_REL1_042926. Fixed .gitignore to cover campaign output files.
Issue report at reports/Dev_REL1_043026.md covers ISS-001 through ISS-007,
with ISS-005 (missing install-lab-host.sh) remaining open.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 14:59:47 -06:00
842918556b Add automated campaign runner, shipper, and systemd units
Implements the unattended episode loop described in docs/deploy.md but not
yet built. run_campaign.py boots a fresh VM per episode, drives the full
phase schedule via the existing EpisodeRunner/VMLoadController stack, writes
campaign.json atomically after each episode, and signals completion with
campaign_done.marker. shipper.py watches data/episodes/ for done.marker
files, tar+zstd-compresses each, and PUTs them to the receiver with
exponential backoff on failure. Both support SIGTERM gracefully, finishing
the current episode/scan before exiting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 14:53:40 -06:00
max
a61fa05980 cis490-prune: retroactively filter low-quality episodes from the dataset
Without a prune step, every fix we land before elliott-lab pulls
leaves a residue of pre-fix episodes in /var/lib/cis490/episodes/.
Trainers either filter at training time (processing the bad data
anyway) or — worse — train on it. This tool walks the receiver's
index, classifies each episode against five quality signals, and
either prints a dry-run summary, archives flagged episodes to
/var/lib/cis490/episodes-archive/, or deletes them outright (with
the index rewritten atomically).

Quality signals (each independent; a bad episode can hit several):

  no-sample           meta.sample is null. Pre-Sample-propagation code
                      ran the v1 yes-loop fallback regardless of fleet
                      selection, so the post-infection family isn't
                      recorded.

  no-workload-events  events.jsonl has zero workload_* rows. Pre-audit-
                      trail code (before VMLoadController emits) — we
                      can't tell whether the workload actually fired.

  workload-failed     events.jsonl contains workload_failed. SerialClient
                      raised mid-phase; labels and telemetry don't match
                      what the orchestrator was supposed to be doing.

  workload-silent     workload_killed event during dormant has
                      pre_kill_probe.yes == "0". The schedule walked
                      but the in-guest workload never started — the
                      elliott-lab fingerprint.

  flat-cpu            /proc CPU% medians spread <5pp across phases.
                      A model can't learn to distinguish phases from
                      this; pure noise to the trainer.

CLI:
  cis490-prune                      # dry-run summary
  cis490-prune --reason no-sample   # restrict to one signal (repeatable)
  cis490-prune --host elliott-lab   # scope to one lab host
  cis490-prune --archive            # mv flagged → episodes-archive/
  cis490-prune --delete             # rm flagged + drop index rows
  cis490-prune --json               # machine-readable

Index rewrite is atomic: tempfile + os.replace, so a crash mid-write
leaves the live index intact.

Tests: 143 (was 132). New cases (tests/test_prune.py):
  - one healthy synthetic episode produces zero reasons
  - five tests covering each individual reason flag
  - dry-run leaves disk + index untouched
  - --archive moves tarballs and rewrites index
  - --delete removes tarballs and rewrites index
  - --host filter scopes correctly (no-match → exit 0)
  - multi-reason episodes report all matching reasons

Live state when this commit lands: 9 elliott-lab episodes from the
pre-fix code path, all flagged. Operator can clear them with one
command before elliott-lab re-ships under main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:41:10 -05:00
max
642f7a94d6 runners: take savevm baseline-v1 after boot so revert_at_* actually works
EpisodeConfig.revert_at_start / revert_at_end have been issuing
loadvm "baseline-v1" via QMP since the snapshot/revert wiring landed,
but no part of the system was running savevm — so loadvm targeted a
snapshot that didn't exist and silently emitted snapshot_revert_failed
every time. The reverted-baseline mode was, in effect, dead code.

Both runners now take a savevm immediately after the guest is up
and reachable, before any workload runs:

  run_real_vm_demo.py — after SerialClient.login() succeeds (Tier 2)
  run_tier3_demo.py   — after _wait_for_tcp on the vulnerable port
                        (Tier 3, before the exploit fires)

Both call qmp.QMPClient.savevm("baseline-v1"). Best-effort: if savevm
fails (older qemu, non-qcow2 disk, KVM nesting issue), we log a
warning and run the episode anyway — just without revert support.

The snapshot_name in EpisodeConfig is unified to "baseline-v1" across
both runners (Tier 3 was previously stamping "qcow2-snapshot-on" into
meta, which didn't match what loadvm would target).

Why both runners take savevm individually instead of a unified path:
the two runners boot different launchers (launch_demo.sh for the
Alpine cidata image, launch_target.sh for the vulnerable target).
Each is responsible for its own QMP socket lifecycle. A shared
savevm helper module would just be a one-line wrapper around the
existing qmp.QMPClient.savevm; not worth the indirection.

Existing test coverage: tests/test_qmp.py exercises
QMPClient.savevm/loadvm against a fake server (HMP wrapper, error
path). The runner-side call is exercised in production but not in
unit tests — would need a fake launcher subprocess, which is outside
this commit's scope.

132/132 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:37:05 -05:00
max
507eac617b Solvable Tier-3 holes: callback payloads, busybox workloads, bridge by default
Closes the next batch of issues from the post-mortem. The previous
"each run uses a different vulnerability" commit shipped 5 modules
but 3 of them couldn't actually fire under SLIRP+restrict=on:
their reverse-shell payloads needed a callback channel the launcher
didn't provide, AND their LHOST options were set to {{ target_ip }}
(the target's IP, not the attacker's — copy-paste from RHOSTS).
Same time, the workloads.py shell commands used bash-only /dev/tcp
redirects that silently no-op'd in the busybox shell sessions
Metasploitable2 returns. Net effect: episodes that selected those
modules would have produced session_open_timeout + dead workloads.

Module configs (the three callback ones):
  exploits/modules/distccd_command_exec.toml
  exploits/modules/php_cgi_arg_injection.toml
  exploits/modules/unreal_ircd_3281_backdoor.toml
    - Switch payload from cmd/unix/reverse* to cmd/unix/bind_perl
      so the target listens on a known port; msfrpcd connects to it
      via the host's hostfwd (no callback path required).
    - Drop the bogus LHOST = "{{ target_ip }}" — bind shells don't
      use LHOST.
    - Add [runtime] table:
        requires_bridge = true
        extra_target_ports = [<bind_lport>]
      Both fields are honored by the loader (ModuleConfig.requires_bridge)
      and the launcher (TARGET_PORTS gets the extra port hostfwd'd
      when BRIDGE mode is active).

orchestrator/fleet.py
  When BRIDGE is unset in env, _run_slot filters the module catalog
  down to modules where requires_bridge=False before calling
  select_module. Two same-socket-shell modules (vsftpd_234_backdoor +
  samba_usermap_script) survive — fleet still has variety; just
  doesn't pick modules whose payloads can't land. With BRIDGE set,
  the full catalog rotates as before, AND BRIDGE is propagated to
  the per-slot subprocess env so launch_target.sh enters tap+bridge
  mode.

exploits/workloads.py
  Replaced bash-only constructs in three profiles:
    scan-and-dial  /dev/tcp/HOST/PORT redirects → nc -z -w 1
    bursty-c2      same fix
    shell-resident exec 3<>/dev/tcp/...  → piping into nc -w
  All three now run cleanly in busybox / dash / Metasploitable2's
  default shell. The remaining three profiles (cpu-saturate, io-walk,
  low-and-slow) were already busybox-portable.

scripts/install-lab-host.sh
  - lab-host.env now defaults BRIDGE=br-malware (was commented out).
    Operator opt-out is to comment the line back in.
  - New step 6b: provisions br-malware via vm/setup_bridge.sh AND
    pre-creates a per-slot tap pool (cis490tap0..7 for Tier-2 demo,
    cis490target0..7 for Tier-3 target) all attached to br-malware
    and brought up. Launchers reference these by SLOT — no sudo
    needed at episode time.
  - On bridge-setup failure, the script auto-comments BRIDGE in the
    env file with a "auto-disabled: bridge setup failed" note so
    the fleet falls back to same-socket modules + Tier-2 cleanly.

tools/cis490_doctor.py
  Two new checks for the lab-host role:
    bridge: br-malware exists / up
    tier3: msfrpcd listening on 127.0.0.1:55553
    tier3: module catalog parses (counts same-socket vs requires_bridge)
  All three are warn-level — they don't fail an otherwise-healthy
  Tier-2-only setup; they tell the operator what's missing for full
  Tier-3 + source 4 coverage.

Tests: 132 (was 129). New cases:
  test_fleet.py +3
    - fleet skips requires_bridge modules when BRIDGE unset (asserted
      across 20 episodes; never picks a callback module)
    - fleet uses the full catalog when BRIDGE is set
    - BRIDGE env propagates to per-slot subprocess

What's still untested live: the bind_perl payloads against a real
Metasploitable2 in the bridge-enabled launcher path. That's a
deployment validation, not a code change. The unit tests confirm
the dispatch / filter logic; the live test is the next operator
action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:32:52 -05:00
max
a193d17ead fleet: rotate exploit modules per (host, slot, ep); Tier 3 by default
Closes the "every run hits the same vulnerability" gap. Before this
commit, the fleet shipped Tier-2 episodes (no exploit at all) with
only the post-infection sample varying. Tier-3 had a single canned
module — vsftpd_234_backdoor — so even when exploit fire was
exercised, the entry vector never changed. Trainer would see one
shape of `armed → infecting` and learn nothing about how varied
real exploits look on the wire / in /proc.

What landed:

exploits/modules/
  + samba_usermap_script.toml          CVE-2007-2447, SMB:139
  + distccd_command_exec.toml          CVE-2004-2687, distcc:3632
  + php_cgi_arg_injection.toml         CVE-2012-1823, http:80
  + unreal_ircd_3281_backdoor.toml     CVE-2010-2075, ircd:6667
  (vsftpd_234_backdoor.toml unchanged)
  All five are canonical Metasploitable2 vectors with stable
  Metasploit modules. Each TOML carries the RPORT the launcher
  needs to wire its hostfwd at, plus a payload tuned to a clean
  shell session (cmd/unix/interact for in-band shells,
  cmd/unix/reverse* with deterministic LPORTs for reverse shells).

exploits/modules.py
  + select_module(catalog, host_id, slot, episode_index) — same
    SHA-256-keyed deterministic selection shape SampleManifest uses
    for samples. Two hosts at the same slot/episode hash to
    different modules; one host walks the full catalog within
    ~len(catalog) episodes.
  + module_target_port() — pulls RPORT off the module config so
    the fleet can plumb the launcher's hostfwd at the right service.

orchestrator/fleet.py
  - _run_slot now decides Tier 3 vs Tier 2 from msfrpcd reachability
    + module-catalog populated. Default is Tier 3 when both are true;
    Tier 2 fallback when not (logged + recorded in SlotResult.tier
    so trainers can filter no-exploit episodes).
  - Per-slot module via select_module() — each concurrent slot in a
    wave gets a different vector AND a different sample.
  - PORT_BASE per slot (target_port + slot * 1000) so concurrent
    Tier-3 targets don't collide on the host-side hostfwd port.
  - _msfrpcd_available() probe gates the dispatch.
  - Fleet-side log line records (slot, ep, tier, sample, module,
    run_dir) so the operator can see at a glance what each wave is
    exercising.
  - SlotResult grows tier + module_name fields; FleetConfig grows
    modules + force_tier2 + msfrpcd_{host,port} fields.

orchestrator/episode.py
  + EpisodeConfig.exploit_meta — plain dict the runner stamps into
    meta.exploit so every Tier-3 episode records {framework,
    module path, module type, payload, RPORT, RHOSTS template}.
    Trainers join on meta.exploit.module_name to stratify by entry
    vector; meta.sample.name to stratify by post-infection family.

tools/run_tier3_demo.py
  + Builds exploit_meta from the loaded ModuleConfig and passes it
    to EpisodeConfig. Sample is now also passed (was missing).

tools/run_fleet.py
  + --modules-dir (default exploits/modules/) — load module catalog
    on startup; pass to FleetConfig.
  + --force-tier2 — escape hatch for dev / smoke tests.
  + JSON output now includes per-slot {tier, module} so the operator
    can see at a glance what each slot ran without grepping logs.

Tests: 129 (was 119). New cases:
  test_exploits.py +6
    - catalog has at least the five canonical Metasploitable2 vectors
    - select_module is deterministic per (host, slot, ep)
    - select_module diversifies across hosts
    - select_module walks the full catalog over many episodes
    - module_target_port pulls RPORT for each shipped TOML
  test_fleet.py +4
    - _run_slot dispatches to run_tier3_demo.py when msfrpcd up
    - falls back to run_real_vm_demo.py when msfrpcd unreachable
    - falls back when module catalog empty
    - --force-tier2 overrides msfrpcd availability
    - PORT_BASE is unique per concurrent slot (no hostfwd collision)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:22:49 -05:00
max
d86502d950 workload audit trail: meta.sample + per-phase events + pre-kill probe
The elliott-lab episode showed every phase median'd 20% CPU because
the in-guest workload silently never fired — and there was no signal
in events.jsonl to detect that from outside, so a trainer would
treat the labels as ground truth and learn "all phases look identical".
This commit closes the audit gap so the failure is visible in meta:

orchestrator/episode.py
  EpisodeConfig.sample: Sample | None — the manifest entry that
  drove this episode's workload selection. Stamped into meta.sample
  as {name, family, category, profile, kind, sha256} so trainers
  can join cleanly without re-deriving from events. None means the
  v1 yes-loop fallback path ran (and the trainer should treat the
  episode with appropriate skepticism).

tools/vm_load_controller.py
  VMLoadController gains an emit_event callable. Every phase now
  emits a workload_* event into the runner's events.jsonl:
    workload_setup        login + initial cleanup OK
    workload_killed       clean / dormant. Dormant carries a
                          `pre_kill_probe` dict from inside the
                          guest (`pgrep -c yes`, `pgrep -c sh`,
                          /proc/loadavg) so the trainer can detect
                          the elliott-lab failure mode where the
                          workload never actually ran.
    workload_armed        armed handshake fired
    workload_infecting    dd urandom / payload write fired
    workload_started      infected_running command sent
    workload_failed       any of the above raised inside SerialClient
                          (timeout, EOF, partial login). The runner
                          would have silently swallowed the
                          exception via its on_phase try/except;
                          the audit row makes the failure detectable.
  Exceptions in shell calls surface as workload_failed events but
  do NOT propagate, matching the runner's existing on_phase
  contract.

tools/run_real_vm_demo.py
  Wires the controller's emit_event to the runner's emit_event via
  a small forward-reference closure (controller is built before
  runner; runner.emit_event needs to be the sink). Sample also
  flows into EpisodeConfig.sample so meta.sample matches what the
  controller actually ran.

Tests: 119 (was 106). New cases:
  tests/test_vm_load_controller.py  (11 tests against a FakeSerial)
    - setup emits workload_setup
    - infected_running runs the v1 yes-loop AND emits workload_started
    - dormant probes BEFORE killing and stamps pre_kill_probe
    - dormant probe records "yes=0" (the elliott-lab fingerprint)
    - clean / armed / infecting all emit their respective events
    - serial.run() exception → workload_failed event, no propagation
    - sample-with-profile dispatches to exploits.workloads command
      (NOT the v1 yes-loop)
    - missing emit_event callback is a no-op (back-compat)
  tests/test_episode.py  (2 new)
    - meta.sample carries name/family/category/profile/kind/sha256
      when EpisodeConfig.sample is set
    - meta.sample stays null in the v1 fallback path

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:12:34 -05:00
max
8753340ea3 fleet: fix per-slot run-dir collision so concurrent VMs actually run
Root cause of "fleet says max_concurrent=3 but only one episode ships
per wave" symptom on elliott-lab:

  1. orchestrator/fleet.py::_run_slot set
     env["RUN_DIR"]=/tmp/cis490-vm-fleet-{slot} per slot.
  2. tools/run_real_vm_demo.py defaulted --run-dir to /tmp/cis490-vm
     (NO slot suffix), then UNCONDITIONALLY overwrote the env's
     RUN_DIR with that flag's value before exec'ing the launcher.
  3. So every slot's launcher saw RUN_DIR=/tmp/cis490-vm. All slots
     collided on the same socket dir.
  4. run_real_vm_demo.py also rmtree(run_dir) on entry — slot 1's
     rmtree literally deleted slot 0's pidfile + sockets mid-boot.
  5. Net effect: one VM survives per wave on a multi-core host that
     should be running ~cores-1 in parallel. Throughput collapses
     to 1/N.

Fix:

  tools/run_real_vm_demo.py + tools/run_tier3_demo.py:
    --run-dir default cascade —
      1) explicit CLI flag
      2) RUN_DIR env (set by fleet runner)
      3) /tmp/cis490-vm-<SLOT>  (SLOT from env, default 0)
    Same change in both runners so Tier-2 + Tier-3 fleet waves
    parallelize cleanly.

  orchestrator/fleet.py::_run_slot:
    Pass --run-dir explicitly to the subprocess so the per-slot path
    is audit-visible in the fleet log instead of buried in env.
    Also flip the subprocess interpreter to repo_root/.venv/bin/python
    when present (was /usr/bin/env python3 — worked by luck because
    the orchestrator path doesn't import msgpack/httpx, but a Tier-3
    fleet wave would have died at import-time on a host without those
    in system Python).

  etc/cis490-orchestrator.service:
    Removed the duplicate [Service] hardening block at the bottom of
    the file that was silently overriding the AmbientCapabilities
    grant (NoNewPrivileges=true at the bottom flipped the
    NoNewPrivileges=false at the top, dropping CAP_NET_RAW + CAP_SYS_
    ADMIN + CAP_PERFMON before per-episode subprocesses inherit
    them). Sources 3 + 4 would have failed silently inside the
    sandbox.
    Added /tmp to ReadWritePaths so per-slot RUN_DIRs are writable.

106/106 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:55:56 -05:00
max
a93a3ff221 bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9, refs #3)
Adds a pull-based cert distribution path so install-lab-host.sh can
fetch its own leaf cert without operator intervention. Removes the
ssh-from-Pi requirement that blocked elliott-lab.

How the chicken-and-egg gets solved: a freshly wg-enrolled lab host
already has WG access (gate kept by iptmonads at L4) and trusts the
Caddy local CA (bundled in this repo at etc/caddy-root.crt). It
makes a single TLS call to https://bootstrap.wg/v1/cert/<host_id>
— no mTLS — gets back a tar of {ca.crt, leaf.pem, leaf.key},
extracts to /etc/cis490/certs/, and the shipper unblocks. Trust
boundary is "reached :443 over WG"; no operator action needed.

bootstrap/
  app.py        Starlette: GET /v1/cert/{host_id}, GET /v1/health.
                Validates host_id charset, rate-limits per source IP,
                logs every mint with the X-Real-IP Caddy injects.
  __main__.py   uvicorn launcher; runs as root because the wg-pki CA
                private key is root-only.

etc/cis490-bootstrap.service
  systemd unit on 127.0.0.1:8446 with ProtectSystem=strict +
  narrow ReadWritePaths=/var/lib/wg-pki. ProtectHome=no because
  systemd's read-only mode hides /home contents (the issuer script
  the wrapper exec's lives there).

scripts/issue-cis490-client-cert-wrapper.sh
  Adapter the bootstrap service shells out to. Resolves the actual
  wg-pki issuer script across the three plausible install layouts
  (/opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki) so a single
  copy of the unit file works on any operator's box. Forces
  --out-dir to /var/lib/wg-pki/issued so writes stay inside the
  service's narrow ReadWritePaths.

scripts/install-lab-host.sh
  After scaffolding lab-host.toml, if /etc/cis490/certs/lab-host.pem
  is absent, curls bootstrap.wg with --cacert etc/caddy-root.crt
  (no chicken-and-egg), extracts, chowns/chmods. Skips silently if
  bootstrap.wg is unreachable so manual hand-carry remains possible.

scripts/install-receiver.sh
  Drops cis490-bootstrap.service alongside cis490-receiver and
  prints both as "enable --now" candidates. cis490-bootstrap is the
  thing that makes lab hosts self-provisioning.

etc/caddy-root.crt
  Bundled copy of wg-pki's published Caddy local CA root, so the
  bootstrap fetch can verify TLS without depending on a wg-pki
  clone that may or may not be on the lab host yet.

Verified live on the Pi:
  $ curl --cacert etc/caddy-root.crt https://bootstrap.wg/v1/cert/elliott-lab -o /tmp/x.tar
  HTTP 200 size=10240
  $ tar tf /tmp/x.tar
  ca.crt
  elliott-lab.key
  elliott-lab.pem
  $ openssl verify -CAfile … elliott-lab.pem
  /tmp/.../elliott-lab.pem: OK
  $ openssl x509 -subject … -noout
  subject=CN=elliott-lab

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:30:29 -05:00
max
6f8b744c33 cis490-doctor + AGENTS.md operator runbook + louder install script
Adds the missing diagnostic + onboarding tools so an agent (AI or
human) handed a fresh lab host can get to "shipping data" without
re-deriving every step from logs.

tools/cis490_doctor.py — one-shot health check that walks the full
stack from the bottom up. Each row is green/yellow/red with an
exact fix command for the red rows. Checks:
  - repo: branch, tree-clean, distance from origin/main
  - install: /opt/cis490, .venv python, /etc/cis490/{lab-host,receiver}.toml,
    /etc/cis490/lab-host.env
  - mTLS: /etc/cis490/certs/{wg-ca,lab-host}.{pem,key}, openssl chain verify
  - systemd: cis490-{shipper,orchestrator,receiver} active state
  - net: receiver.url DNS, TCP reach, mTLS handshake to collector.wg
  - vm prereqs: /dev/kvm, qemu-system-x86_64, zstd, alpine-baseline.qcow2,
    cidata.iso
  - tier3 prereqs: msfrpcd, metasploitable2.qcow2 (warn-level)
  - end-to-end: cis490-shipper --ping
Modes: --role {lab-host,receiver}, --json (machine-readable),
--no-tier3 (skip optional checks). Exits non-zero on any red row.
ANSI color (auto-disabled on non-tty / NO_COLOR).

AGENTS.md gains a "How a lab host gets to shipping data" canonical
flow at the top: cert delivery via wg-pki/deploy-cis490-cert.sh →
install-lab-host.sh → cis490-doctor → systemctl enable. Plus an
"on-demand episode" recipe + a "smallest E2E test" snippet for
agents that need to verify the pipe without waiting on the timer.
The strict "cloning the repo by itself does nothing" callout makes
the failure mode mu and elliott-lab hit explicit.

scripts/install-lab-host.sh prints a 5-step banner on first install
that points at cis490_doctor.py + the deploy-cis490-cert.sh flow,
plus an always-printed footer warning that "cloning + running
launchers manually is NOT enough." Same message the AGENTS.md
section reinforces.

Refs spectral/CIS490#8 (the "Tier-2 is shipping in the meantime"
claim that turned out to be untrue because no cis490-shipper
service was running on elliott-lab — exactly the case this
diagnostic tool targets).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:11:57 -05:00
max
7311802822 orchestrator: emit snapshot_load before _write_meta to keep t_mono ~0
On slower disks (Pi5 SD cards, mu's hardware) the json.dump → write →
os.replace path inside _write_meta takes more than 1 ms, so when the
snapshot_load event fired afterwards its t_mono_ns drifted past the
"<1 ms after origin" assertion in test_driver_events_persist_to_events_jsonl.

Fix: emit snapshot_load immediately after setting _t_mono_origin_ns,
before any file I/O. Matches the semantic intent (snapshot_load marks
episode clock = 0) and removes the disk-speed dependency from the
event timeline.

Diagnosis + suggested patch from spectral/CIS490#7 (filed by mu).

Closes spectral/CIS490#7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:49:50 -05:00
max
637fb064df README: Tier 4 is shipped, source 3 is shipped — drop the stale 🚧 marks
Closing the loop on the previous wave's commits. Tier 4 (real-malware
fetch + chunked upload + guest-side sha-verify + exec) and source 3
(perf stat collector) are both implemented and tested as of a88ac83;
the README still tagged them as TBD / planned. Fix.

- Tier 4 status: 🚧 code;  awaiting operator's MalwareBazaar API
  key + at least one sha256 entry in manifest.toml. Same shape as the
  Tier-3 line.
- New "Tier 4 — real malware sample" section walks through the
  fetch → chunked upload → guest-side sha-verify → exec flow with
  links to the relevant code.
- Source 3 (perf stat): "🚧 planned" → " opt-in via enable_perf".
- Snapshot/revert (revert_at_start / revert_at_end via QMP loadvm)
  added to the Orchestrator + drivers list.
- Test-count header updated 86 → 106.
- Stale issue links to closed #4 / #5 / #6 dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:37:00 -05:00
max
a88ac83db0 Close out the deployment-readiness gaps
Wraps the gaps surfaced in the "what is not implemented" audit so the
fleet really is shippable end-to-end. Verified live on the Pi:
  - cis490-shipper --ping → HTTP 200 through Caddy + mTLS via the
    new wg-pki client CA leaf
  - real episode dir → tar+zstd → PUT → HTTP 201 stored
  - re-ship same bytes → 200 (idempotent)
  - re-ship different bytes under same id → 409 (conflict)

Changes:

orchestrator/episode.py
  - EpisodeConfig.revert_at_start / revert_at_end (Tier 0+ snapshot/
    revert per docs/architecture.md). When set + qmp_socket present,
    EpisodeRunner issues loadvm <snapshot_name> and emits
    snapshot_revert / snapshot_revert_failed events on the same
    monotonic clock as everything else.

collectors/qmp.py
  - savevm() / loadvm() helpers using human-monitor-command, plus a
    test against the fake QMP server.

exploits/workloads.py
  - chunked_real_binary_upload() returns a ChunkedUpload plan: 8 KiB
    base64 chunks (~6 KiB binary each) so msfrpc never sees a buffer-
    busting payload. Includes a finalize step that sha256-verifies on
    the guest before exec.
  - real_binary_workload() now wraps the chunked plan for backwards
    compat with single-shot callers.

exploits/driver.py
  - Tier-4 dispatch walks the chunked plan in MSFExploitDriver:
    each chunk is a separate session_shell_write; finalize verifies;
    exec only runs on sha-ok. New events: real_binary_upload_begin,
    real_binary_verify, real_binary_aborted.

etc/cis490-orchestrator.service
  - Reads /etc/cis490/lab-host.env (FLEET_HOST_ID + optional BRIDGE).
  - Grants AmbientCapabilities CAP_NET_RAW (tcpdump for source 4) +
    CAP_SYS_ADMIN + CAP_PERFMON (perf for source 3) so collectors
    work under hardening.

scripts/install-lab-host.sh
  - Writes /etc/cis490/lab-host.env on first install with FLEET_HOST_ID
    defaulting to `hostname -s`.
  - Best-effort: fetches the Alpine baseline qcow2 (sha512-pinned) and
    builds cidata.iso with the in-guest agent embedded; symlinks both
    into /opt/cis490/vm/images/ so launchers find them.

scripts/fetch-alpine-baseline.sh
  - Idempotent fetcher for the Alpine 3.21 cloud-init nocloud qcow2
    matching the sha512 in docs/sources.md.

tools/plot_envelope.py
  - Rebuilt to render whatever telemetry the episode dir contains:
    proc → QMP block ops → perf IPC/miss-rate → bridge pkts/SYNs →
    guest agent load/mem. Missing sources are silently skipped.

tools/index_reader.py
  - cis490-index CLI: filter receiver's index.jsonl by host / sample
    / time range, sort, count-by group. Closest thing to a query
    interface until we stand up Postgres/Timescale.

samples/README.md
  - Rewritten to match the new manifest schema, the kind=real vs mimic
    split, the per-(host, slot, ep) selection mechanic, and the
    chunked-upload safety story.

Tests: 106 pass (was 102). New cases:
  - test_qmp.py — savevm + loadvm (HMP wrapper + error path)
  - test_tier4.py — chunked plan splitting, sha-pinned finalize,
    end-to-end driver walks all chunks + verify + exec via the fake
    msfrpc client

Closes the "what is not implemented" punch list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:31:55 -05:00
max
bdcd2ecbef Close out the open issues: bridge pcap wiring, perf collector, Tier-4
Wraps the three remaining 🚧 items from the README so every collector
the threat-model promises is actually live, and the Tier-4 path
(real-malware fetch + upload + exec) works end-to-end as soon as a
sha256 lands in samples/store/.

Closes spectral/CIS490#4, #5, #6.

== #6 — Bridge pcap wiring ==
EpisodeConfig grows three optional fields:
  bridge_iface: str | None        # e.g. "br-malware"
  bridge_ip:    str = "10.200.0.1"
  pcap_snaplen: int = 256
When bridge_iface is set, EpisodeRunner spawns tcpdump for the duration
of the schedule (network.pcap), stops it cleanly on episode end, and
runs collectors.pcap.bucketize() to produce netflow.jsonl per the
100-ms schema in docs/data-model.md. EpisodeResult + meta.result
gain rows_netflow + pcap_bytes counters.

vm/launch_demo.sh + launch_target.sh now switch between SLIRP usermode
and tap+bridge based on $BRIDGE — operator pre-creates the tap as a
bridge member, no sudo from the launcher.

run_real_vm_demo.py picks BRIDGE up from env so the fleet runner can
opt entire waves into pcap mode by exporting BRIDGE before invocation.

== #5 — Source 3 perf collector ==
collectors/perf_qemu.py shells out to ``perf stat -p <pid> -I 100 -j``
and parses the per-event JSON stream. Aggregates one row per interval
across the canonical event set (cycles/instructions/cache-{refs,misses}/
branches/branch-misses/page-faults/context-switches), computes IPC +
cache-miss rate. Tolerates missing events (``<not counted>`` /
``<not supported>``) without dropping the row, and skips cleanly when
``perf`` isn't on PATH or the process can't be attached.

EpisodeConfig.enable_perf=True opts into the collector — off by default
because perf needs CAP_SYS_ADMIN or perf_event_paranoid <= 1. When
enabled, runs as a parallel thread alongside the other collectors;
EpisodeResult.rows_perf records the count.

== #4 — Tier 4 (real-malware fetch + upload + exec) ==
tools/fetch_sample.py: pulls a sample by sha256 from MalwareBazaar
(API key from env or samples/.bazaar.token), unzips with the standard
"infected" password, verifies the resulting binary's sha256, lands at
samples/store/<sha256>. Idempotent — already-staged correct binaries
return immediately.

samples/manifest.py: Sample.binary_path(store_root) resolves to the
staged binary path, or None for mimics / not-yet-fetched real samples.

exploits/workloads.py: real_binary_workload(bytes, sample) builds a
Workload that base64-uploads the binary into the shell session via a
heredoc, decodes + chmods + execs it in the background, captures the
PID for clean stop on dormant. Per-profile pid/bin paths so concurrent
samples in the same guest don't collide.

exploits/driver.py: dispatch order is now:
  1) sample.kind == "real" + binary staged at sample_store_root
     → real_binary_workload (Tier 4)
  2) profile mimic from workloads.workload_for() (Tier 3 v2)
  3) None → driver v1 fallback yes-loop
DriverConfig.sample_store_root is the new field; run_tier3_demo.py
wires it to repo_root/samples/store. driver_setup event records
sample_sha256 so trainers can join Tier-4 episodes against the
manifest by hash.

samples/store/.gitkeep added (binaries themselves are gitignored).

Tests: 102 pass (was 86). New suites:
  tests/test_perf_qemu.py — parser + builder + perf-missing fallback
  tests/test_tier4.py     — real_binary_workload base64 round-trip,
                            stop-cmd kills pidfile, per-profile path
                            isolation, driver dispatch chooses real vs
                            mimic correctly, fetcher input validation
                            and cached-fast-path

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:17:49 -05:00
max
c89dbe29e7 README + AGENTS.md: reflect fleet, driver v2, all 4 collectors
README:
- Intro now describes the multi-host fleet + cross-host sample
  diversity as the primary workflow.
- Tier 2 section: profile-driven workload table replaces the old
  "yes / dd" description.
- New Tier 3 section: covers driver v2 dispatch + setup automation
  scripts.
- Tier maturity table refreshed (1, 2 ; 3  code /  image; 4 🚧).
- Telemetry-sources table moved into the per-tier story so the
  oracle-vs-feature split is visible from the top of the doc.
- Status section restructured by section (Pipeline, Telemetry,
  Orchestrator + drivers, Fleet) instead of a flat list. Cross-links
  to the new Forgejo issues for the remaining gaps:
    #4 — Tier 4 MalwareBazaar fetcher
    #5 — source 3 (perf stat)
    #6 — bridge pcap per-episode wiring
- Quick-start sections rewritten:
    1) "fleet mode (the primary workflow)" with --capacity + --waves
    2) "single episode, no fleet" covering both Tier 2 + Tier 3
    3) "multi-host fleet — how cross-host diversity works" explains
       the deterministic per-(host, slot, ep) selection mechanism
- Repo-layout table updated to include shipper/, scripts/, AGENTS.md,
  and the workloads/fleet additions.
- Deploying section: replaces the "TODO scaffolds" wording with the
  actual sudo install-receiver / install-lab-host / wg-pki bring-up
  flow that's running on the Pi today.

AGENTS.md: adds a "don't put off the hard parts" convention as the
first item under Other conventions, with explicit guidance on when
"deferred-with-reason" is legitimate (genuine operator artifact
missing) and the requirement to file an issue + automate the
bring-up so it Just Works once the artifact lands.

86/86 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:11:35 -05:00
max
b80986d99c Driver v2: sample-profile-driven workloads (Tier-2 + Tier-3)
The v1 driver ran ``yes > /dev/null`` for every sample, which
produced the same envelope shape regardless of which malware family
the orchestrator claimed to be running. That's a poor training
signal: the model sees identical /proc + QMP traces tagged
"cryptominer" / "ransomware" / "RAT" with no distinguishing
features. v2 fixes this.

What landed:

  exploits/workloads.py — six ``Workload`` profiles, each producing
    a distinct in-session shell command pair (start_cmd / stop_cmd)
    that backgrounds a profile-shaped loop:

      cpu-saturate    — sustained 1-vCPU saturation (XMRig shape)
      scan-and-dial   — periodic SYN-style probes across 10.200.0.0/24
                        + dial-home to gateway (Mirai shape)
      io-walk         — fs traversal + 4 KiB urandom writes, periodic
                        re-read (ransomware shape)
      bursty-c2       — long idle, periodic 3-packet TCP egress burst
                        (Dridex C2 beacon shape)
      low-and-slow    — minimal CPU + periodic awk-driven memory churn
                        (Kovter / fileless shape)
      shell-resident  — single long-lived TCP socket pinned to gateway
                        with periodic 6-byte command ticks (RAT shape)

  Each profile uses a /tmp/.cis490-workload-<profile>.{pid,sh} pair so
  the stop_cmd can cleanly kill the loop and its descendants.

  exploits/driver.py — MSFExploitDriver now accepts an optional
    ``Sample``. With one supplied, ``infected_running`` dispatches to
    the matching workload via exploits.workloads.workload_for(); the
    ``sample_executed`` event records profile + sample name + sample
    kind so the trainer can join cleanly. Without a sample, the v1
    yes-loop path remains unchanged (backwards compat).

  tools/vm_load_controller.py — the same dispatch on the Tier-2 path
    (no exploit, real Alpine guest driven over the serial console).
    A fleet wave now produces six visually distinct envelopes per
    wave whether the underlying mode is Tier 2 or Tier 3.

  tools/run_real_vm_demo.py — accepts ``--sample <name>`` (or
    SAMPLE_NAME env from the fleet runner) + auto-wires QMP + agent
    sockets into the EpisodeConfig so all three new collectors
    (sources 2, 4, 5) run alongside source 1 by default.

  tools/run_tier3_demo.py — same ``--sample`` plumbing for the
    exploit-driven path.

Tests: 86 pass (was 82). New v2 cases:
  - profile dispatch routes infected_running to the workload's
    start_cmd (NOT the v1 yes-loop) when a Sample is set
  - all six profiles produce distinct start_cmds (the property the
    ML model needs)
  - unknown profile string falls back to cpu-saturate with a warning
  - v1 path (no Sample) still uses yes-loop (backwards compat)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:06:15 -05:00
max
1b6c7b2f4a Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts
This is the chunk that makes "real data" actually flow on multiple
hosts in parallel. End-to-end pipe was up at 613c6fa / 2579683; now
the lab-host side has the diversity + concurrency it needs.

Collectors landed:
  collectors/qmp.py          — source 2 (oracle). Tiny synchronous QMP
                               client + row builder + run loop. Tolerates
                               older qemu without query-stats.
  collectors/guest_agent.py  — source 5 (deployable). Reads the
                               virtio-serial host-side socket, parses
                               agent JSON-lines, re-stamps to the host
                               monotonic clock, persists.
  collectors/pcap.py         — source 4 (deployable). tcpdump capture
                               + pure-Python pcap reader + 100 ms
                               netflow.jsonl bucketizer. Decodes
                               Ethernet/IPv4/TCP/UDP enough for the
                               schema in docs/data-model.md.

In-guest agent:
  vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads
    /proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs,
    thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent.
  tools/build_cidata.py — embeds the agent + an OpenRC service into
    user-data so first boot of the Alpine cidata image auto-starts it.

Launchers:
  vm/launch_demo.sh / launch_target.sh — second virtio-serial port for
    the agent socket; SLOT env support so multiple VMs run without
    socket / port collisions; PORT_BASE on launch_target so multiple
    target VMs hostfwd different host ports.
  vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24,
    no NAT). Idempotent.

Fleet:
  orchestrator/fleet.py — capacity detector (cores / RAM / load
    headroom) + concurrent-slot runner. Per-slot ENV selects the
    sample. FleetCapacity dataclass round-trips into meta.json so
    "this episode ran with 6 concurrent VMs" is auditable post-hoc.
  tools/run_fleet.py — CLI: --capacity report; --waves N runs N
    waves of (max_concurrent) episodes each, every slot with a
    different sample.
  etc/cis490-orchestrator.service — now drives the fleet runner with
    Restart=always so each invocation runs one wave and respawns,
    giving a continuous stream.

Samples:
  samples/manifest.toml — six profiles spanning the five major
    behaviour shapes. Each entry is real OR mimic (sha256 distinguishes).
  samples/manifest.py — strict TOML loader (rejects dups, unknown
    categories) + deterministic select(host_id, slot, episode_index)
    so different hosts on the network walk the catalog in different
    orders without any coordinator.

EpisodeRunner:
  orchestrator/episode.py — optional qmp_socket + guest_agent_socket
    fields on EpisodeConfig; when set, additional collector threads
    run alongside proc_qemu. EpisodeResult now carries rows_qmp +
    rows_guest counters.

Tier-3 setup automation:
  scripts/install-msfrpcd.sh — installs metasploit-framework where
    the package manager has it, generates a strong password into
    /etc/cis490/msfrpc.env, drops a hardened systemd unit bound to
    127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch
    once MSFRPC_PASSWORD is sourced.
  scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256
    from the operator (Rapid7 download is registration-walled), pulls,
    verifies, converts vmdk → qcow2, lands at vm/images/.

Tests: 82 pass (was 51). New suites:
  tests/test_qmp.py       — fake QMP server, capability handshake,
                            blockstats, async-event interleaving,
                            5-failure backoff
  tests/test_guest_agent.py — fake virtio socket, JSON-lines read +
                              re-stamp, malformed-line tolerance
  tests/test_pcap.py      — synthetic pcap with TCP/UDP/ARP frames,
                            bucketize correctness across windows
  tests/test_fleet.py     — capacity math (8-core idle / low-RAM /
                            high-load / Pi5 / 1-core box), manifest
                            selection determinism + diversity

What's queued for the next commit (already discussed in convo):
  - MSFExploitDriver v2: map sample.profile → distinct in-session
    workload so Tier-3 episodes don't all produce the same yes-loop
    envelope. Critical for ML to learn varied malware shapes.
  - Real-sample fetch from MalwareBazaar by sha256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:02:27 -05:00
max
2579683efb receiver: default to 127.0.0.1:8444 (avoid wg-enroll-listener on 8443)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:45:23 -05:00
max
7c9f9582ca Lab-host shipper + receiver /v1/ping + install scripts
Implements the deployment loop end-to-end on the CIS490 side:

shipper/
  config.py      ShipperConfig (host_id, paths, receiver endpoint, mTLS)
  transport.py   httpx-based PUT + ping with mTLS + bearer support
  queue.py       scan data/episodes/, tar+zstd via system zstd, ship,
                 retire to data/shipped/. Idempotent across crashes per
                 the state machine in docs/transport.md.
  __main__.py    CLI: --ping (smoke test), --once (one pass), or daemon

receiver/app.py: new POST /v1/ping that requires the same auth as PUT
  /v1/episodes but writes nothing. Used by `cis490-shipper --ping`
  during lab-host bring-up to verify the WG/Caddy/mTLS path before
  shipping any real bytes.

etc/
  cis490-shipper.service       systemd unit for the lab-host shipper
  cis490-orchestrator.service  systemd unit for the lab-host queue
                               (kept disabled by default until queue
                               mode lands)
  lab-host.toml.example        config template

scripts/
  install-lab-host.sh   idempotent installer; verifies prereqs,
                        creates cis490 service user, syncs repo to
                        /opt/cis490, builds venv, drops systemd units
                        and config template
  install-receiver.sh   same, for the receiver role on the central WG
                        node (Pi5 in our setup)

tests/test_shipper.py  11 end-to-end tests against a real Uvicorn
                       server hosting the receiver app. Exercises
                       ping, tar+ship, idempotent re-ship, 409
                       conflict, transient (receiver down), tarball
                       round-trip via system zstd.

AGENTS.md  guidance for AI agents working on this and sibling repos.
           Headline: when you hit an issue you can't fully fix in
           scope, file a Forgejo issue rather than leaving a TODO.

51/51 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:41:32 -05:00
max
613c6fa223 Tier 3: msfrpc-driven exploit driver + first module config
Adds the Tier-3 exploit driver — an MSFExploitDriver that plugs into
EpisodeRunner.on_phase, fires a Metasploit module against a target VM
via msfrpcd, watches for the resulting session, and stamps each
transition (exploit_fire, session_open, session_landing_probe,
sample_executed, session_dormant, session_killed) into the episode's
events.jsonl on the orchestrator's monotonic clock.

What landed:
- exploits/msfrpc.py — minimal msgpack-over-HTTPS client (auth,
  module.execute, job/session lifecycle) so we don't depend on a
  third-party MSF wrapper.
- exploits/driver.py — phase-to-msfrpc adapter; idempotent fire,
  session-open polling with timeout, workload start/stop, teardown.
- exploits/modules.py + exploits/modules/vsftpd_234_backdoor.toml —
  TOML module configs with {{ target_ip }} placeholders, replacing the
  imperative .rc-script approach the README previously hinted at.
- vm/launch_target.sh — SLIRP+restrict=on launcher for the
  intentionally-vulnerable target VM (host can reach guest via
  hostfwd, guest cannot reach host or internet).
- tools/run_tier3_demo.py — end-to-end runner mirroring run_real_vm_demo.
- tests/test_exploits.py — 12 new tests against a fake MSFRpcClient,
  including an integration test that drives a real EpisodeRunner.

Plumbing changes:
- EpisodeRunner._emit_event → public emit_event, so external drivers
  share the runner's monotonic clock and events.jsonl.
- mkdir for episode_dir moved to __init__ so emit_event is callable
  before run() (driver_setup fires pre-schedule).

Status: driver + tests pass (40/40); end-to-end against a live msfrpcd
+ Metasploitable2 image is the next bring-up step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:11:52 -05:00
Maximus Gorog
7216ec09bd Tier 2: real Alpine VM, real workload, real envelope
End-to-end now drives a real KVM guest through the full XMRig-shaped
phase schedule with the workload running INSIDE the guest. Telemetry is
host-side /proc/<qemu_pid>; the load is busybox `yes` (sustained CPU
saturation) and `dd if=/dev/urandom` (disk burst on infecting), driven
over the serial console at every phase transition. The plotted envelope
shows clean idle → armed → infecting (disk spike) → infected_running
(100% CPU plateau) → dormant → re-entry → final clean.

Components:

  vm/launch_demo.sh              now boots Alpine 3.21 nocloud-cloudinit
                                 (Cirros 0.6.x's cirros-init blocks on the
                                 EC2 metadata service for ~17 min before
                                 falling through to NoCloud — abandoned).
                                 Mounts a cidata ISO as a second drive.

  tools/build_cidata.py          pure-Python NoCloud ISO builder (pycdlib).
                                 Sets root password and ssh_pwauth via
                                 runcmd so we don't depend on a specific
                                 cloud-init version's plain_text_passwd
                                 handling.

  tools/vm_serial.py             serial-console client (stdlib socket).
                                 Idempotent login (detects already-in-shell
                                 state), sentinel-bracketed run() that
                                 distinguishes shell output from the TTY
                                 echo of input by requiring a leading
                                 \r\n boundary on the marker.

  tools/vm_load_controller.py    in-guest load controller. set_phase()
                                 dispatches the per-phase shell command
                                 over the serial connection.

  tools/run_real_vm_demo.py      ties it all together: boot VM, wait for
                                 cloud-init runcmd, log in, run the
                                 EpisodeRunner with on_phase=controller,
                                 shut down VM.

Deps: paramiko, pycdlib added.

docs/sources.md updated with Alpine cloud image (sha512 pinned), and
the new Python deps.

README leads with the tier-2 plot now (real VM, real workload). The
previous synthetic plot is moved below with explicit "host-side mimic,
not a VM" labelling. Tier-2 status flipped to  in the tier table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:38:53 -06:00
Maximus Gorog
32ae161ef2 README: embed demo plots, mark synthetic vs real clearly, add collapsibles
The README now leads with a 'What an episode looks like' section that
shows both:

  * docs/images/synthetic-envelope.png — pipeline-validation plot. Real
    telemetry of a real process whose load is shaped by tools/load_mimic.py
    (Python). Explicitly labelled NOT REAL MALWARE in the caption — the
    earlier wording was unclear.

  * docs/images/real-vm-idle.png — real Cirros 0.6.3 booted under KVM,
    same orchestrator + /proc collector pointed at the qemu-system pid.
    Idle baseline; no exploit, no payload yet.

A 'What's still missing for the real-malware envelope' table makes the
tier path explicit (real VM idle → real workload in-guest → real exploit
fire → real sample).

Repository nav, deploy steps, design rationale, and threat model are
moved into <details>...</details> blocks so first-time visitors see the
demo plots and the status list without scrolling past wall-of-text.
Stale Pi-as-deployment-target wording in the design-rationale section
is fixed alongside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:11:54 -06:00
Maximus Gorog
cc37fc6c4d Interactive envelope plot via WebAgg (browser-based)
plot_envelope.py grows a --show flag. With it, matplotlib's WebAgg backend
spins up a localhost server with a real interactive figure (zoom, pan,
hover, axes lock) — equivalent to a matlab plot window without needing
tkinter or Qt locally.

tools/show_envelope.sh is a NixOS-aware wrapper: it locates libstdc++.so.6
in /nix/store (numpy's prebuilt wheel needs it on LD_LIBRARY_PATH) and then
exec's the python script with --show. Default port 8988, override via
--port. Bound to 0.0.0.0 so the figure is reachable over WG too.

tornado is added to dev deps because WebAgg requires it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:06:22 -06:00
Maximus Gorog
69c09f4404 Phase 2: real-VM episode (Cirros under KVM) + works-cited doc
vm/launch_demo.sh boots a Cirros qcow2 under KVM with QMP and a monitor
socket exposed; snapshot=on routes guest writes to a temporary overlay
so the on-disk image is never mutated (clean factory reset every boot).

End-to-end verified: vm/launch_demo.sh → orchestrator with --target-pid
&lt;qemu pid&gt; → 201 telemetry rows over 20s against the real qemu-system
process. The plotted envelope shows the expected idle-VM shape:
periodic ~10% CPU spikes from KVM/timer interrupts, flat 230 MiB RSS,
and a single late-boot disk write. Distinct from the synthetic
load_mimic envelope, confirming the collector reads real KVM behavior.

docs/sources.md is the works-cited doc — every tool, library, sample
source, paper, and standard the project leans on, grouped by category.
README's nav table now points at it. README's status section also lists
what's done vs. in progress so reviewers can see scope at a glance.

Note: vm/images/ stays gitignored. The Cirros 0.6.3 image is documented
with its sha256 (7d6355852aeb...) in docs/sources.md so any team member
can reproduce the bytes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:00:25 -06:00
Maximus Gorog
970698af83 Synthetic envelope demo: phase-driven load mimic + plotter
End-to-end pipeline now produces a labeled envelope from a single command.
Drives the orchestrator through an 8-phase XMRig-shaped schedule and
renders a 3-panel envelope (CPU%, RSS, IO write rate) with phase bands
sourced from labels.jsonl. Real telemetry, simulated load — validates the
collection + labeling shape before a real VM is involved.

Components:
- tools/load_mimic.py        phase-driven load generator. Reads phase
                             commands on stdin; CPU/IO behavior matches
                             the named phase (clean=idle, armed=light burst,
                             infecting=disk burst+CPU, infected_running=
                             CPU saturation+stratum-shaped writes,
                             dormant=quieter than clean).
- tools/run_envelope_demo.py spawns load_mimic, drives EpisodeRunner with
                             a default 85s schedule that includes the
                             classic infected_running → dormant → re-entry
                             pattern.
- tools/plot_envelope.py     reads telemetry + labels from an episode dir,
                             writes envelope.png with colored phase bands.

orchestrator: EpisodeRunner now takes an optional phase_schedule and an
on_phase callback. Walks the schedule emitting one label per transition.
Backwards-compatible — existing single-phase tests still green.

Doc fix (user pushback): README + architecture + threat-model no longer
imply the Pi5 is the deployment target. Pi5's actual role here is the
WireGuard-side collector for episode tarballs. Deployment target is
generic ("constrained Linux device"). The "gateway observer" concept
remains a deployment pattern, decoupled from the Pi5's collector role.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:53:20 -06:00
Maximus Gorog
064387b7a0 Add v0 orchestrator + first oracle collector (host /proc)
End-to-end: ``python -m orchestrator --target-pid <pid> --duration N`` now
writes a complete episode directory matching docs/data-model.md, with phase
labels, events, and a 10 Hz host /proc telemetry stream. No VM yet — pid is
arbitrary so we can validate the loop against e.g. ``sleep 5`` while the lab
side comes up.

collectors/proc_qemu.py — parses /proc/<pid>/{stat,io,status} (handles parens
in comm), single-shot collect_once(), and a stop-event-driven run_loop()
that ticks at a fixed cadence and exits when the pid disappears. Tagged
``available_in_deployment: false`` per the threat-model doc.

orchestrator/episode.py — EpisodeRunner: creates data/episodes/<ulid>/,
atomic meta.json, events.jsonl + labels.jsonl writers, drives the collector
in a thread for duration_s, writes done.marker last so the shipper never
sees a half-finished episode.

orchestrator/ulid.py — tiny 26-char Crockford-base32 ULID generator.
Time-sortable, no third-party dep.

orchestrator/__main__.py — CLI entry point.

Tests (15 new, 28 total green):
- proc_qemu: real-ish stat with parens-in-comm, missing /proc/<pid>/io,
  missing pid, run_loop cadence, run_loop terminates when pid disappears.
- episode: full directory shape against os.getpid(), id override,
  done.marker written after meta.json finalize.
- ulid: length+alphabet, 2000-burst uniqueness, time-sortability.

Smoke-tested against ``sleep 10``: 16 rows over 1.5s at 100ms cadence,
monotonic clock, RSS stable at ~3.5 MiB as expected for an idle sleep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:40:25 -06:00
Maximus Gorog
83e111961d Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency
Implements docs/transport.md as a small Starlette app. The receiver streams
episode tarballs to disk, verifies sha256 against an X-Content-SHA256 header,
atomically renames into the store on success, and appends one row to a flat
index.jsonl. No DB. Idempotent re-PUTs return 200; conflicting bodies return
409. Optional bearer-token auth (mTLS terminates at Caddy in prod).

receiver/
  store.py        EpisodeStore: sha-verifying streaming ingest, atomic rename,
                  append-only index. No HTTP.
  app.py          make_app(): Starlette routes + bearer guard.
  config.py       ReceiverConfig.load(): TOML parser.
  __main__.py     uvicorn entrypoint, reads --config TOML.

tests/test_receiver.py — 13 tests via httpx.ASGITransport. Covers: 201 new,
200 idempotent replay, 409 conflict, 400 sha mismatch + cleanup, 400 missing/
short header, 400 bad id, 400 bad suffix, 413 too large, 401 bearer enforcement,
schema-version pass-through.

etc/cis490-receiver.service — systemd unit with hardening flags.
etc/receiver.toml.example — config template matching docs/deploy.md.

End-to-end smoke-tested with curl: 201 → 200 → 409 path verified, file
on disk, single index row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:34:04 -06:00
Maximus Gorog
fa1574a0a6 Scaffold project: docs, repo skeleton, transport + deploy design
Lays down the design surface for the CIS490 behavioral-malware-detection
dataset and model. No code yet — schema and topology are decided first so
collection can start without rework.

Docs:
- README: project goal, navigation
- architecture: lab topology, KVM choice, episode state machine,
  deployment-mirror reasoning
- threat-model: train/serve parity rule, oracle-vs-deployable feature
  split, two-model evaluation strategy
- data-model: per-episode JSONL layout, row schemas, phase enum
- transport: WG-native shipper/receiver design, idempotent uploads
- deploy: one-command install for lab-host and receiver roles
- lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring

Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/,
training/ (each with a short README explaining purpose).
Extended .gitignore to exclude qcow2 images, pcaps, sample binaries,
secrets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:21:00 -06:00
7a0fefc02e Initial commit 2026-04-27 17:28:48 -06:00