CIS490

Author	SHA1	Message	Date
Max	8643192a71	training/fleet: distributed multi-host trainer with capability gating Symmetric companion to the collection fleet (orchestrator/fleet.py) but for training. Collection is embarrassingly parallel; training is not (a model is trained at most once across the fleet), so the receiver coordinates which worker gets which job. Operator-control surface is etc/training_manifest.toml.example — single canonical file declaring (a) per-host capability + per-model allow/deny policy, (b) one [[jobs]] entry per (model, mode, hyper) with capability constraints (require_cuda, prefer_cuda, min_vram_gib, min_ram_gib, allowed_hosts). Components: capability.py — self-detection: hostname, cores, RAM, CUDA presence, VRAM, torch version, git commit. Used by workers to filter eligible jobs before claiming. manifest.py — TOML loader + JobSpec/HostSpec. Job IDs are stable sha256 of (model, mode, hyper, split_recipe, train_hosts, seed) so manifest reload is idempotent: existing rows keep their status, new jobs become claimable, removed jobs stay until cancelled. queue.py — SQLite job queue (training_jobs.db) with statuses pending\|claimed\|running\|completed\|failed\|cancelled. Atomic claim_next via single UPDATE WHERE status='pending'. Heartbeat, complete, fail. Stale-claim sweep (stale_after_s=600s) with max_attempts cutoff to failed. store.py — model artifact store mirroring receiver/store.py. Artifact ID is the sha256 of the uploaded tarball; bit-identical re-runs deduplicate. receiver.py — Starlette app exposing 11 endpoints: POST /v1/job/claim (worker) POST /v1/job/{id}/heartbeat (worker) POST /v1/job/{id}/complete (worker) POST /v1/job/{id}/fail (worker) PUT /v1/model/{id} (worker — uploads tarball) GET /v1/jobs (anyone) GET /v1/workers (anyone) POST /v1/job/{id}/cancel (operator: X-Operator-Token) POST /v1/job/{id}/requeue (operator) POST /v1/manifest/reload (operator) GET /v1/health (anyone) Runs as cis490-trainer-receiver.service on the Pi alongside the existing receiver, on a separate port. client.py — stdlib HTTP client (urllib only, no new deps). worker.py — long-running daemon. Loop: detect capability → claim → spawn training/trainer/run.py subprocess → heartbeat every 30s → tar artifact, sha256, PUT /v1/model → complete. SIGTERM-safe. Operator CLI (tools/cis490_jobs.py): status / list / show / cancel / requeue / reload / workers. Cancel and requeue require $CIS490_OPERATOR_TOKEN matching the receiver's configured value. Bootstrap: scripts/install-training-worker.sh (Linux systemd) and scripts/install-training-worker-windows.ps1 (Windows Scheduled Task) let the operator enroll a new host with one command after cloning the repo and setting up the venv. Worker self-tests capability before registering. End-to-end smoke verified on the Pi: receiver up, manifest synced, 14 jobs queued, worker registered, claimed 4 CPU-eligible jobs (allow_jobs=["gbt","mlp"]), completed 3 (gbt-realistic, gbt-oracle, mlp-oracle), 1 failed with the actual error visible via cis490-jobs status, 3 artifacts uploaded to /var/lib/cis490/models/<model>_<mode>/<sha256>/bundle.tar.zst with proper index.jsonl row. 21 unit tests (manifest validation: 8; queue lifecycle + eligibility: 13). All pass alongside the prior 17 training tests = 38 green. Open limitations surfaced inline: - Hyper-key drift between manifest and run.py fails at training time, not at manifest reload (worth tightening to argparse introspection later). - mTLS not yet wired through Caddy for the trainer-receiver port — listens loopback-only until that lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 01:20:20 -05:00
Max	1fabd4a246	training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers The model layer of the project, built honestly: - tools/dataset_validate.py — full-sweep validator over the receiver store (sha256, schema, monotonic labels, telemetry-row gate). On the current corpus: 64,798 accepted + 8,154 degraded + 3,701 rejected + 7 errored across 76,660 shipped episodes. data/processed/validation_v1.parquet is committed as the per-episode acceptance index. - training/_features.py — channel registry (46 channels across proc/guest/qmp/netflow), summary-stat windowing AND channel×time tensor extraction at 10s/5s windowing. Time alignment uses t_wall_ns (Unix ns) — tested fix for a real netflow-vs-host clock-base inconsistency that was silently dropping every netflow channel. - training/_split.py — three held-out recipes (host / sample / time) with profile-stratification assertions. held_out_host carries untested_profiles for cases like scan-and-dial absent from the test host (5 of 6 profiles tested cross-device, never silently averaged). - training/models/ — 6 architectures behind a common BaseModel interface: gbt (XGBoost), mlp, cnn, gru, lstm, transformer. Each trained twice (realistic / oracle) per the deployment threat model. Schema-hashed checkpoints refuse to load if _features.py changed since training (silent-input-drift protection, tested). - training/trainer/ — unified training loop: class-weighted CE, LR warmup + cosine, gradient clipping, mixed precision when CUDA, early stopping on val macro F1, best-on-val checkpoint. Same loop runs MLP/CNN/GRU/LSTM/Transformer; GBT uses XGBoost early_stopping_rounds on val mlogloss. - training/eval_/ — bootstrap 95% CIs on macro F1, per-class F1, per-profile and per-host breakdown, paired-bootstrap significance for model-vs-model gap. Confusion matrix uses union of seen labels. - training/dashboard/producers/ — replay/metrics/perf/profiles emitting the six event types the dashboard's awaiting scenes consume; on-demand tensor extraction so the Pi can run live inference without 65 GB of shards. - 17 unit tests (split coverage, features round-trip, schema mismatch, determinism, time-base alignment regression). End-to-end smoke-trained all six on a 567-episode subset; held-out test macro F1 reported with paired-bootstrap significance. The methodology now reports honest cross-device generalization, not in-distribution validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 01:19:00 -05:00
Max Gorog	d9f913fc97	PIPELINE §5 step 6: event-driven labeller (§4.5) Phase labels are written ONLY when justifying events arrive. The schedule clock is now a budget — an upper bound — never a label source. This is the core honesty fix the §3 evidence demanded: Before: every Tier-3 episode wrote `infected_running` from the schedule clock regardless of whether session_open ever fired. Per §10 every dishonest label is a poisoned training example. 67/67 of the §3 probe episodes were poisoned this way. After: `infecting` writes ONLY when exploit_fire is observed in events.jsonl. `infected_running` writes ONLY when session_open is observed. Either timing out or seeing session_open_timeout terminates the walker with a `failed` label that the §4.6 acceptance gate will reject. PHASE_JUSTIFYING_EVENTS in orchestrator/episode.py declares which events justify which phases: "clean": None # orchestrator-emitted "armed": None # orchestrator-emitted "infecting": ("exploit_fire",) "infected_running": ("session_open",) TERMINAL_FAILURE_EVENTS = {"session_open_timeout"} short-circuit any event-driven wait into a `failed` label. `dormant` is intentionally OFF the canonical schedule. §4.5 calls for dormant to be event-driven (session_idle / session_active) too, but the driver doesn't emit those yet. Per §1 default-to-removal we ship without dormant rather than label it from the clock; when the driver gains those emits, dormant re-enters the schedule with proper justification. EpisodeRunner now owns: * `_event_log` — every emit_event appends here * `_event_cv` — condition variable for waiters * `_wait_for_event(names, since_t_mono_ns, timeout_s)` — returns the first matching event in the log with t_mono >= threshold; threshold catches events that fired during the previous on_phase callback. When an event-driven phase's justifier already arrived (e.g. exploit_fire emitted by driver._fire() inside on_phase("armed")), the walker uses the EVENT's t_mono on the label — not the time the walker noticed. The label means "this is when this thing actually happened." manifest.toml: dropped the dormant cycle from the canonical schedule. Episode is shorter (~30s) but every label is event-justified. 14 new tests in tests/test_event_driven_labeller.py covering: justifier mapping invariants, _wait_for_event semantics (already-arrived, future, timeout, since-threshold, first-of-multiple-names), walker behavior (orchestrator-emitted phases, event-driven phases, missing event → failed, terminal-failure-event short-circuit, stop event, event-t_mono on label, phase_transition events with justified_by). 286 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:43:16 -05:00
Max Gorog	0d51b9b253	PIPELINE §5 step 5: collector admission emit tests (§4.4) Adds the missing emit-tests so every collector in KNOWN_COLLECTORS has end-to-end coverage: * test_proc_emits_rows_against_self_pid Samples /proc/<own pid> for ~0.6s. Asserts ≥3 rows + populated core fields (cpu_user_jiffies, rss_bytes, vsize_bytes). Works anywhere with /proc. * test_pcap_bucketize_emits_rows_from_synthetic_capture Builds a 2-packet Ethernet+IPv4+TCP pcap in-memory, feeds it to pcap.bucketize, asserts ≥1 row written + total packet count across buckets matches input. Covers BOTH the pcap and netflow collectors (netflow IS the bucketized pcap output). * test_every_known_collector_has_emit_coverage Cross-cutting tripwire: for every name in KNOWN_COLLECTORS, either there's a test_collectors_emit.py test or there's an explicit COLLECTOR_TEST_CARVE_OUTS entry. Adding a collector to KNOWN_COLLECTORS without an emit test fails this. Carve-outs today: qmp (covered by tests/test_qmp.py — needs running QEMU for real-binary emit) and guest_agent (covered by tests/test_guest_agent.py — needs a real VM with the agent baked in). The carve-outs are explicit, not implicit. A drift where someone adds a new collector without a real-binary emit test fails CI before the manifest can include it. 272 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:37:40 -05:00
Max Gorog	22269e175d	PIPELINE §5 step 4: catalog admission verifier (§4.3) tools/verify_catalog.py runs the §4.3 end-to-end verification flow against every entry in manifest.toml's [catalog].modules (or a single named module). The flow follows §4.3 exactly: 1. Load the module config + the verified-against target spec. 2. Resolve the published image path; fail loudly if absent. 3. Boot the target VM under §4.13 containment (restrict=on, snapshot=on, no shared FS, unprivileged QEMU — same posture as verify.sh). 4. Wait for the service on the spec'd port. 5. Login to msfrpcd, snapshot the existing session set, fire the module against `127.0.0.1:<host_port>` (the SLIRP hostfwd to the guest's promised service port). 6. Wait for `session_open` — NOT session_open_timeout, which is the §4.5 failed-label outcome. 7. Round-trip a shell command (`id`); confirm uid= shape. 8. Confirm a guest-side artifact (touch marker; ls + echo VERIFY_OK). Per-module exit code is 0 only when EVERY step passes. CLI exit is 0 only when EVERY requested module passes — partial credit isn't an option (§1 default-to-removal: a module that can't pass shouldn't be in the catalog). Structured JSON output with per-step timings + detail strings, written to stdout or --out <path>. Operator pulls this into a successful CI run + signs off on the manifest.toml [[catalog.modules]] amendment with a fresh `last_verified = <commit_sha>` per §15. Tests (tests/test_verify_catalog.py, 8 cases): exercise the flow with a mocked MSFRpcClient + mocked qemu boot. Cover happy path, every short-circuit failure mode (image missing, service never up, session timeout, shell round-trip wrong, guest artifact missing), and spec-load errors. Real verification needs lab hardware; the mocked flow proves the orchestration contract. 269 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:35:32 -05:00
Max Gorog	4d29b7236d	PIPELINE §5 step 3: target VM build infrastructure + containment posture §4.2 calls for target VMs we BUILD, not VMs we fetch. §4.13 demands every target ship the same isolation posture (no upstream egress, no host-shared FS, unprivileged QEMU, fresh snapshot per episode). This commit lands the infrastructure for both. New surface: * orchestrator/target_spec.py Loads + validates `vm/targets/<name>/spec.toml`. Containment fields are not knobs — each has exactly ONE safe value, and a spec asserting the unsafe value is rejected at load time. There's no `--containment-override`; weakening §4.13 requires amending PIPELINE.md and operator sign-off. * tools/build_target.py Orchestrates build → verify → publish for a single target. Spec invalid → exit 78 (sysadmin error). build.sh failure → image not published. verify.sh failure → image discarded; that's the §4.2 acceptance gate. Publishes sha256 + the manifest.toml stanza the operator copies in to admit the image (§16 substantive amendment with sign-off per §15). * vm/targets/<name>/{spec.toml,build.sh,verify.sh} Template structure. spec.toml is the contract; build.sh produces $OUT_PATH; verify.sh boots the produced image under the §4.13 containment posture and asserts every promise. * vm/targets/shellshock/ First real working target. CVE-2014-6271 (Apache mod_cgi + bash 4.2 mis-parsing function-export environment values). Replaces the SourceForge Metasploitable2 path that §3 evidence proved unverifiable. Bash 4.2 is built from sha256-pinned GNU source inside an Alpine 3.21 cloudinit guest; the build script asserts the produced bash actually triggers shellshock; the verifier re-asserts it under restrict=on with a real CVE-2014-6271 probe. * vm/targets/README.md How operators add a target. Walks the spec → build → verify → manifest amendment loop. Containment regression tests (tests/test_containment.py) — 20 new assertions, parameterized over every target with a build/verify trio: * verify.sh MUST contain `restrict=on` on its netdev (§4.13) * verify.sh MUST contain `snapshot=on` on the boot drive (§4.13) * verify.sh + build.sh MUST NOT contain -virtfs / -fsdev / 9pfs * verify.sh + build.sh MUST NOT wrap qemu-system in `sudo` * Every target must ship the complete spec.toml + build.sh + verify.sh trio — no half-built targets (§1 default-to-removal) Spec validation tests (tests/test_target_spec.py): 13 new tests over spec parse, name/dir mismatch, missing fields, out-of-range port, and the §4.13 containment field validators (each unsafe value rejected with a clear error). The shellshock target's image is NOT yet published to manifest.toml's [[targets.images]] — that's the §15 sign-off amendment that lands after a successful operator-driven build_target.py run on a lab host with KVM. Building takes ~10 min on x86_64; cannot run on the Pi under TCG. Operator drives the first build, verifies the sha256, then amends manifest.toml in a follow-up commit. 261 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:31:40 -05:00
Max Gorog	207a902c3e	PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml The experiment is now defined by a single version-pinned file — manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every lab host loads THIS exact file; per-host overrides of experiment shape are forbidden. Drops the following per-host CLI overrides that previously violated the canonical-manifest principle: * --manifest, --modules-dir (paths now derived) * --ram-per-vm-mib (in manifest.experiment) * --max-concurrent (manifest.experiment.fleet.max_concurrent_ceiling) * --max-tier3-slots (manifest.experiment.fleet.max_tier3_slots) * --force-tier2 (not a §14 sanctioned override knob — ship empty catalog to disable Tier-3) * --require-real-samples (sample-side concern; out of fleet scope) * tools/run__demo.py --manifest (samples path now from canonical) New surface: manifest.toml — the single source of truth * orchestrator/manifest.py — load_canonical() + Manifest dataclass with strict validation, raises ManifestError on any failure * EpisodeConfig.experiment_meta — populated by run__demo.py from the canonical manifest; stamped into every episode's meta.json under "experiment" key for provenance cis490-orchestrator.service — RestartPreventExitStatus=78 so manifest-load failures stay stuck-and-loud (§9, §4.7) * install-lab-host.sh — validates manifest.toml at install time; missing or invalid = die with clear message Catalog admission semantics: only modules whose name appears in manifest.catalog get loaded into the runtime catalog (§4.3 in miniature, will tighten further in step 4 when verified_against / last_verified actually gate admission). Missing toml for an admitted name is a sysadmin error → exit 78. Renames cfg.manifest → cfg.samples + adds cfg.experiment to disambiguate sample-manifest from experiment-manifest. Rewrites test_fleet.py fixture to construct synthetic Manifest objects so test outcomes don't depend on the on-disk manifest.toml content. 12 new tests in tests/test_manifest.py: schema-version mismatch, unknown collector, duplicate collector, unknown phase, negative phase seconds, negative ram, missing catalog fields, json round-trip. Local run: `python tools/run_fleet.py --capacity` correctly logs the loaded manifest and prints capacity. 241 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:25:01 -05:00
Max Gorog	dca6144a4a	catalog: remove samba_usermap_script — never landed sessions in prod PIPELINE.md §1 (default-to-removal), §4.3 (catalog admission), §10 (every dishonest label is a poisoned training example). Empirical evidence on commits `4ab5477` → `c41763b`: samba_usermap_script fired its bind_perl payload but the framework's bind handler never managed to connect to the guest's listening port within session_open_timeout_s=30 (or even with WfsDelay=30 bumped on the framework side). All 67 attempts in the §3 probe ended in session_open_timeout. Yet the schedule clock was still writing `infected_running` labels for the failed exploit — exactly the §10 poisoned-example pattern. Until §5 step 3 builds an in-house target VM and step 4 re-admits modules with `verified_against` recorded (§4.3), the production catalog should consist of zero verified Tier-3 modules. That's the state after this removal: the four remaining modules (vsftpd_234_backdoor, distccd_command_exec, php_cgi_arg_injection, unreal_ircd_3281_backdoor) are all `requires_bridge=true`, which the fleet picker filters out unconditionally (the post-revert behavior from commit `0390eb2`). Net effect: production runs Tier-2 only, producing honest Tier-2 episodes and zero dishonest Tier-3 infected_running labels. Test fixture updated to inject synthetic in-memory ModuleConfigs instead of loading from disk, so Tier-3 dispatch logic stays tested even though no production module qualifies. test_exploits asserts the new "every shipped module is requires_bridge until §4.3 admits something verified" invariant — flips into a tripwire if anyone reintroduces an unverified non-bridge module. 229 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 22:48:03 -05:00
Max Gorog	0390eb20b6	fix: revert speculative fleet picker change — was producing dishonest labels Empirical evidence from k-gamingcom (commit `4ab5477`, 2026-05-03 22:20Z vsftpd_234_backdoor episode): the picker selected vsftpd because BRIDGE was set on that host. The exploit fires against target_ip=127.0.0.1 (SLIRP loopback) but vsftpd's hardcoded port-6200 backdoor is reachable only at the guest's bridge IP. Result: session_open_timeout, AND a schedule-clock-driven `infected_running` label was still written for the failed exploit — exactly the §10 poisoned-training-example pattern. Until guest-IP discovery for bridge mode is wired (a separate piece of infrastructure), bridge-only modules can't actually reach their target even when the operator sets BRIDGE for Tier-2's pcap source. Revert the picker to its prior conservative form: drop requires_bridge modules unconditionally regardless of BRIDGE state. Same for the BRIDGE env strip in the Tier-3 launch path — it was correct as unconditional. Replaces the two aspirational tests (test_fleet_uses_all_modules_when_bridge_set, test_fleet_propagates_bridge_env_to_runner) with their honest negatives (test_tier3_drops_requires_bridge_modules_unconditionally, test_tier3_strips_bridge_env_even_when_set). The previous tests asserted behavior the rest of the pipeline can't deliver; they were false signals. 229 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:58:43 -05:00
Max Gorog	4ab5477226	PIPELINE §5 step 1: fix four root-cause defects Diagnoses + fixes for the silent-collector / never-lands-session failures that the 200-episode quality probe surfaced (§3 evidence). All four address the producer; no compensating layers added. perf collector (rows_perf=0 on 100% of episodes): - perf stat -j writes to stderr by default with -p; we read stdout. Add --log-fd 1 so JSON reaches stdout where the parser sees it. - Event names come back annotated with the privilege scope perf actually measured ("cycles:u" under perf_event_paranoid=2). Strip the suffix so _build_row's plain-name lookups hit. Without this every metric was None even when perf reported real numbers. - tests/test_collectors_emit.py covers the regression with a real busy-loop fixture; emit-test discipline per §4.4. guest-agent collector (rows_guest=0 on 100% of episodes): - Alpine cloud image doesn't ship python3, so the in-guest agent's `#!/usr/bin/env python3` shebang silently fails. Add packages: [python3] to cidata user-data so cloud-init installs it before the OpenRC service starts. - Guest agent now exits nonzero (was: silent stdout fallback) when /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC reports the failure to /var/log/cis490-agent.log instead of the bytes vanishing into the void. Refs §1. - Host-side collector emits guest_agent_connected / guest_agent_first_byte / guest_agent_silent_window into the orchestrator's events.jsonl. Future episodes show the in-guest failure mode per-episode instead of inferring from rows_guest=0. k-gamingcom missing qmp/netflow/pcap (also affected elliott on Tier-3 episodes — was misclassified as host divergence): - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT qmp_socket / guest_agent_socket / bridge_iface — even though launch_target.sh creates the underlying chardevs and BRIDGE supplies the iface. tools/run_real_vm_demo.py wires them correctly; Tier-3 had a copy-paste gap. - tests/test_collectors_emit.py adds a source-grep regression so the wiring stays honest. samba_usermap_script never lands session (0/67 in §3 probe): - Bind handler default WfsDelay (~5s) gives up before bind_perl on Metasploitable2 has finished forking + binding LPORT under SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in exploits/driver.py so framework + driver agree on the wait budget. Add ConnectTimeout=15 so the handler's bind connect has retry budget instead of one-shot. orchestrator/fleet.py: usable_modules + BRIDGE handling were both unconditional, so: - With BRIDGE set, requires_bridge modules were still being dropped — picker only ever returned samba_usermap_script across every slot/episode (the test_fleet_uses_all_modules_when_bridge_set failure on HEAD). - env.pop("BRIDGE") fired even when BRIDGE was the operator's explicit setup, breaking modules that need bridge mode (vsftpd backdoor on hardcoded port 6200, distccd, etc.). Both made conditional on bridge_set so the picker walks the full catalog under bridge mode and SLIRP-only modules still get a clean SLIRP env when BRIDGE is unset. receiver/app.py: half-pregnant v2 schema state in HEAD — calling store.ingest_stream(episode_type=..., benign_profile=...) with kwargs the matching store.py change was in the WIP stash. Removed v2 awareness from app.py so v1 episodes (what the producer ships today) get accepted again. SCHEMA_VERSION default reset to 1 to match. 229 passed, 0 failed. (HEAD had 15 failures, all linked to the half-pregnant v2 state above.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:05:25 -05:00
max	05bf785f0a	fleet-health: exit 0 when alerts found (don't mark unit failed) The detector previously returned 1 on alerts, which made systemd mark cis490-fleet-health.service as 'failed' every tick that found a sick host. That's the wrong UX — a detector finding a fault is working correctly, not crashing. The alert is the signal (via WARNING log + alerts.jsonl); the unit's success state should mean "the detector itself ran cleanly." Test added. Caught while live-deploying on the Pi: the first run found elliott-thinkpad fatal-only at 943×4xx + 1425×5xx and correctly emitted the alert — but systemd showed the unit red, which would have caused operators to chase the wrong tail. Side note: the same first run also caught a real bug — pycache for receiver.store on /opt/cis490 was stale after I deployed the new app.py + store.py from main, causing 1464 × 500 responses. Cleared the pycache and the index immediately resumed growing (4465 → 4515 in 30 seconds). The detector earned its keep on the very first cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 13:51:20 -05:00
max	49eba2fd60	fleet-health: proactive alerts on the Pi + per-host doctor reports Two pieces of self-monitoring so the maintainer isn't the alarm: (2) Receiver-side fleet health monitor cis490-fleet-health.timer runs check_fleet_health.py every 5 min. Detects three symptoms and writes them to /var/lib/cis490/alerts.jsonl + a syslog WARNING (greppable / easy to forward to a notifier): silent — host shipped in last 24h but has been quiet >30 min fatal-only — actively shipping but every PUT 4xx unstamped — shipping without X-Cis490-Code-Commit header Dedup is keyed on (host, symptom, hour-bucket) so a sustained fault fires once per hour, not every 5 min. 15 unit tests cover the index parser, three detectors, and dedup. (3) Per-host doctor snapshots Lab hosts run cis490-doctor-check.timer once a day (10 min after boot, then daily with 30-min jitter). The timer runs cis490_doctor.py --json and PUTs the result to a new endpoint: PUT /v1/host-health/<host> → /var/lib/cis490/host-health/<host>.json GET /v1/host-health → aggregate across all hosts Endpoint is NOT gated by version_gate — sick hosts running stale code MUST still be able to report sickness. 11 unit tests cover PUT/GET, atomic-write semantics, bearer auth, and the not-gated-by-version-gate property. ship_health_check.py reuses the existing shipper transport (mTLS + bearer + receiver URL from lab-host.toml) so we don't reimplement auth. Both timers wired into install-lab-host.sh — the loop also enables the previously-added autoupdate + cert-fetch timers, so a single install run gives a host all four self-healing mechanisms. Tests: 293 pass (26 new — 15 fleet-health, 11 host-health). 2 pre-existing test_fleet.py failures from the elliott-ThinkPad merge (`667f042`) are unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 13:48:31 -05:00
max	f9b2e5c4e6	shipper: systemd watchdog, quarantine cleanup; doctor surfaces ship errors Three robustness items off the future-work list: 1. Shipper sd_notify watchdog. Type=notify + WatchdogSec=180. The daemon sends READY=1 after queue construction and WATCHDOG=1 once per scan pass via a heartbeat callback wired into run_forever. Restart=on-failure only catches process death — silent stalls (deadlock, hung tar subprocess, blocked I/O past timeout) used to leave a zombie running with the data backlog growing. Now systemd kills + restarts the daemon if no WATCHDOG=1 arrives within 180s. Verified end-to-end against systemd via `systemd-run --transient --property=Type=notify --property=WatchdogSec=10`: unit transitions to active on READY=1; SIGSTOP'ing the process triggers `Watchdog timeout (limit 10s)! Killing process N with SIGABRT` at exactly t+10s, then unit goes failed → restart cycle. 2. Quarantine cleanup. Without an upper bound, data/quarantine/ grew forever as fatal episodes piled up. New ShipperConfig fields: quarantine_keep_days = 30 # opt-out: 0 disables quarantine_cleanup_interval_s = 3600 # gate so 5s tick doesn't # statx() the whole tree Cleanup runs at the start of run_once() but is gated to once per hour. Removed entries logged. 3. Doctor surfaces shipping errors. Tails 10 minutes of cis490-shipper journal and surfaces 412/400/transient patterns as red/yellow rows with the canonical fix command. An on-device agent running cis490_doctor.py now sees one line ("12 ship(s) rejected as out-of-window") instead of needing to grep the journal. Tests: 200/200 (was 188). New coverage: heartbeat callback fires + survives exceptions; quarantine cleanup respects keep_days, gate, and opt-out; doctor parser correctly classifies 412/400/transient/clean/ empty/journalctl-denied; both error classes prioritise 412 (more actionable) when present together. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 12:02:59 -05:00
max	5cebe7096a	robustness: gate falls back to local git, queue sweeps stale tarballs Two follow-ups from the post-cutover diagnosis: 1. version_gate: forgejo → local git fallback. If forgejo refresh returns empty AND a local repo path is configured, retry against `git log` from the local checkout. The receiver service runs on the same Pi as forgejo, so a simultaneous restart used to leave the gate's cache empty and reject every PUT with not-in-window. Auto-detects /opt/cis490/.git when the operator hasn't set local_repo_path explicitly — that path is always present on a production receiver and ProtectSystem=strict still allows reads. Logs `source=git-fallback` so this isn't silent. 2. shipper/queue: sweep orphaned outbox tarballs. The lifecycle invariant is `outbox/<id>.tar.zst exists ⇒ episodes/<id>/ exists` — broken historically by the now-fixed fatal-loop, by operator `rm` of an episode dir, or by an OS crash between rename(2) and the post-ship cleanup. Without sweeping, dead bytes pile up forever. New _sweep_outbox runs at the start of every scan, bounded by the file count in outbox/. Tests cover: fallback fires when forgejo unreachable + repo_path set; no fallback when repo_path None (opt-in); orphan tarball + partial get swept on the next pass; live tarballs untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:49:38 -05:00
max	eda6164897	fix: lab-host install loop after commit-gate cutover Why services weren't starting after the gate went live: 1. install-lab-host.sh self-copy. The receiver's 400 remediation tells the agent to `cd /opt/cis490 && git pull && sudo ./scripts/install-lab-host.sh`. That makes REPO_ROOT==INSTALL_ROOT and `cp -aT $REPO_ROOT $INSTALL_ROOT` errors with "are the same file"; `set -e` aborts before the systemd units install or anything restarts. Detect the same-dir case and skip the cp; chown still runs. 2. Services never restart. install-lab-host.sh and install-tier-3-4.sh both ended by telling the operator to restart, then exiting. The running shipper/orchestrator kept executing pre-gate code from the old module objects, so new `code_version` stamping never reached an episode. Both scripts now `systemctl restart` the units they own when those units are enabled. 3. Shipper queue fatal-loop. queue.py incremented `fatal++` but didn't move the episode out of `data/episodes/`. Next scan re-tarred and re-PUT the same dir, getting 400 again. With 4465+ pre-stamp episodes on k-gamingcom this burned ~1 PUT/sec for 5+ hours of receiver log. Fatal episodes now move to data/quarantine/<id>/ with a quarantine_reason.json beside them; the outbox tarball is deleted. 4. Pre-stamp backlog drain. tools/quarantine_unstamped.py is a one-shot that scans data/episodes/ and quarantines anything without a 40-char-hex code_version.commit. Wired into install-lab-host.sh step 9 so a re-install drains the queue automatically. Idempotent; safe to run while the shipper is active. Tests cover the queue's new fatal-quarantine path and every drain behaviour (kept/quarantined/dry-run/idempotent/missing-meta/collision). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:36:21 -05:00
max	e2bb76144f	tools/verify_tier3_local.py: Pi-runnable Tier-3 verifier Closes the "have you tested it" gap as much as we can without x86 KVM. The Pi is ARM64 — can't boot Metasploitable2 or run KVM-accelerated guests. But most of the Tier-3 chain doesn't need x86: * chunked_real_binary_upload is just shell commands over a pipe * exploit module TOMLs and the deterministic selector are pure Python * manifest loading + sample selection are pure Python * msfrpcd itself runs on ARM (Ruby + Java) * the receiver's commit gate is the same on any arch verify_tier3_local.py exercises each of those end-to-end, in process, on this Pi: PASS exploits/modules/.toml parse + selector deterministic PASS manifest loads + selector covers every sample PASS chunked binary upload survives a real /bin/sh round-trip (150 KB binary, 26 chunks, sha256-verified end to end) PASS staged samples are Linux i386 ELF (when staged) PASS msfrpcd round-trips core.version (when listening) PASS receiver /v1/health + gate enforces commit allow-list Live result on this Pi today: 5 PASS, 1 SKIP (msfrpcd not installed on the Pi, which is correct — the Pi is the receiver, not a lab host). When run on a lab host after install-tier-3-4.sh, all 6 PASS gives full Tier-3 readiness. What this script does NOT verify (still needs x86 KVM on a lab host, covered by install-tier-3-4.sh's verify step): Metasploitable2 boots under QEMU/KVM * vsftpd_234_backdoor lands a session against it * the chunked-upload binary actually executes inside that session But the chunked-upload step proves every byte of the upload path (printf '%s', heredoc-free path, base64 decode, sha256 verify, chmod, exec scaffold) works against a real POSIX shell. An msfrpc session presents the same shell interface, so a passing local-sh test is strong evidence the production path will work. tests/test_tier3_local_verify.py wraps the deterministic steps (module parse, manifest, chunked upload) so pytest catches regressions automatically. 174/174 total. Operator workflow: ssh into Pi (or lab host), run: /opt/cis490/.venv/bin/python tools/verify_tier3_local.py Each step prints PASS/FAIL/SKIP with detail. Exit 1 if any FAIL.	2026-05-01 03:41:21 -05:00
max	b809e1e26e	auto_fetch_samples: pick Linux i386 ELF; manifest matches theZoo User caught it: I shipped the theZoo path without running it end-to-end. A real fetch on the Pi exposed two bugs: 1. Family-name matcher was substring-strict. "Cryptolocker-class" wouldn't match the dir "CryptoLocker_22Jan2014" because "-class" isn't in the dir name. Now expands to a sequence of tokens (full, head-of-dash, head-of-dot, head-of-underscore) and tries each. First match wins. 2. Extraction picker was "largest non-text" — a bad heuristic for theZoo, where each Linux.* zip often contains MULTIPLE binaries for different platforms (Linux i386, x86-64, ARM, FreeBSD, sometimes even Windows PE). The largest is rarely the i386 Linux ELF that would actually run on Metasploitable2. Now sniffs ELF magic bytes in stdlib and tiers: 1. Linux i386 ELF (largest first) 2. any other ELF (best-effort, may not execute) 3. largest non-text (Wine fallback) Verified end-to-end on the Pi against a real theZoo clone (~500 MB, 263 family dirs, 2026-05-01 fresh pull): linux-encoder-ransomware → ELF 32-bit Intel i386 SYSV (278 KB) linux-wirenet-rat → ELF 32-bit Intel i386 SYSV (64 KB) linux-rex-ransomware → ELF 32-bit Intel i386 SYSV Go (7.6 MB) linux-neurevt-bot → ELF 32-bit Intel i386 SYSV (3.0 MB) linux-earthkrahang-apt → ELF 32-bit Intel i386 GNU/Linux (5.8 MB) 5/5 picks are runnable Linux i386 ELFs. Manifest rewrites in place add source/sha256/url; meta.sample.kind goes to "real" automatically. Manifest rewritten: - Old families (XMRig, Mirai, Cryptolocker-class, Dridex, Kovter, Reverse-Shell) → mostly absent from theZoo's Linux catalog or matched the wrong arch. - New families chosen against a verified theZoo presence list: Linux.Encoder, Linux.Wirenet, Ransomware.Rex, Neurevt, EarthKrahang. - XMRig + Kovter remain as mimic-only fallbacks (theZoo lacks a runnable Linux i386 binary for these; orchestrator falls back to the mimic profile). Tests added (tests/test_auto_fetch_samples.py): 13 cases covering ELF magic detection (i386 accepted, FreeBSD/x86-64/ARM/PE32/text all rejected), family-token expansion (the "-class" suffix bug), extraction picker (prefers Linux i386 over larger non-Linux ELFs), manifest in-place rewrite preserves mode + skips entries that already have sha256. What's still NOT verified end-to-end (requires a lab host with KVM x86): - Metasploitable2 boot under QEMU - vsftpd_234_backdoor exploit fire via msfrpcd - chunked binary upload through a real shell session - real binary executing inside a Metasploitable2 guest The Pi is ARM64 — can't run Metasploitable2. install-tier-3-4.sh's verify step (run_tier3_demo.py) covers all four on a real lab host; deploy verifies on first run there. 171/171 tests pass.	2026-05-01 03:28:26 -05:00
max	cc0c96953e	version_gate: Forgejo as canonical commit source (no fs perms needed) Initial git-log-based gate ran into a permission wall: the cis490 service user can't read /home/max/cis490/.git (ProtectHome=true + home-dir mode). Switching the production source to the local Forgejo HTTP API (already accessible to all WG peers, single source of truth both lab hosts and the receiver pull from). When the maintainer pushes new code to spectral/CIS490, the next 5-second cache refresh sees the new commit and lab hosts can immediately ship under it. VersionGate now takes either: - forgejo_url + repo_owner + repo_name + branch (+ optional auth_token for private repos): hits /api/v1/repos/<owner>/<name>/commits?sha=<branch>&limit=<n> - repo_path: dev-only fallback, runs `git log` locally Local-git path retained for tests + the dev-only case. receiver.toml.example gains forgejo_url/repo_owner/repo_name/branch with auth_token commented; live-deployed receiver.toml on the Pi has the spectral org + token. Live state on the Pi: 41 valid hashes loaded, head=f8ad02b. Verified end-to-end: bogus commit → 412 + remediation HEAD commit → clears gate (fails downstream at sha-mismatch as expected for the empty-body verify probe) Test added: test_forgejo_backend_accepts_returned_commits stands up a tiny canned-response HTTPServer in-process, exercises the parser without depending on a live Forgejo instance. Brings test_version_gate to 10 cases; total 158/158.	2026-05-01 01:42:45 -05:00
max	f8ad02b2d7	Receiver enforces X-Cis490-Code-Commit allow-list (live, auto-refreshed) Stops out-of-date lab hosts from polluting the dataset with episodes generated by buggy code. The valid-commits set mirrors the maintainer's working clone on the Pi automatically — when the maintainer pulls or pushes a new commit, the receiver picks it up within the 5-second cache TTL with no service restart. Receiver changes: - receiver/version_gate.py (new): VersionGate(repo_path, window). Each check() consults a frozenset of the last `window` commit hashes from `git -C <repo> log --format=%H -n <window>`, refreshed every 5s under a lock. Resilient to transient git failure (keeps prior cache so a flaky `git` doesn't lock out every shipper). - receiver/app.py: PUT extracts X-Cis490-Code-Commit; gate.check() before ingest. Rejects with: 400 + remediation if header missing or malformed 412 + remediation + your_commit + head_commit if not in window Remediation block is verbatim copy-pasteable into the lab-host shell: cd /opt/cis490 && sudo -u cis490 git pull origin main sudo /opt/cis490/scripts/install-lab-host.sh sudo systemctl restart cis490-orchestrator - receiver/store.py: ingest_stream takes commit kwarg, stamps it on the index.jsonl row (new optional field). Backfilled rows from index_backfill.py also pull commit out of meta.json. - receiver/config.py + etc/receiver.toml.example: new [version_gate] section. enabled=true, repo_path=/home/max/cis490, window=100 by default. Enabled toggle exists for emergency disable-and-collect. Shipper changes: - shipper/transport.py: ship_tarball() takes commit kwarg, sends X-Cis490-Code-Commit header. 412 maps to status='fatal' so the queue doesn't infinite-retry — operator must pull and reinstall before the next ship will succeed. - shipper/queue.py: reads meta.json::code_version.commit per episode, passes through. On 412, logs the receiver's full remediation block at ERROR level so journalctl on the lab host shows exactly what to run. Tests: 9 in test_version_gate (including 2 end-to-end via starlette.testclient), 2 cover the boundary where new commits land mid-cache and where missing-repo gracefully keeps prior cache. 157/157 total. Index schema: existing rows stay valid (commit field is optional on read). New rows from receiver-direct AND from index_backfill.py include commit.	2026-05-01 01:38:50 -05:00
max	5c0bc9af8e	meta.json: stamp code_version (commit, branch, dirty) per episode Closes a real reproducibility gap. Three weeks of bug fixes have shipped (probe fix in `2707709`, multi-signal classifier in `321ea63`, mandatory tier-4 in `265f3ad`, etc.); without a per-episode code_version, trainers can't tell which episodes came from buggy pre-fix code and have to scan every tarball to guess. Resolution priority (cached across episodes): 1. $INSTALL_ROOT/VERSION (production — install-lab-host.sh writes it at install time since /opt/cis490 is a flat copy with no .git) 2. git rev-parse HEAD from the repo root (dev clones) 3. {"commit": "unknown", source: "unknown"} so the field is always present (filterable) Output shape, always present in meta.json: "code_version": { "commit": "<40-hex>" \| "unknown", "branch": "<name>" \| null, "dirty": bool \| null, "source": "VERSION-file" \| "git" \| "unknown" } install-lab-host.sh writes VERSION at install time with the source repo's git rev-parse HEAD + branch + clean-tree flag + install timestamp. Lab-host agents that pull main + re-run install-lab-host.sh get a fresh stamp automatically. 148/148 tests pass; test_episode_against_self_pid_produces_full_directory asserts the field's presence + valid `source` value.	2026-05-01 01:29:01 -05:00
max	265f3ad313	Tier-4 sample source: theZoo (no auth, no operator action) Replaces MalwareBazaar with theZoo (https://github.com/ytisf/theZoo). theZoo is a public security-research repo with hundreds of malware samples organized by family, password-protected with the well-known 'infected'. No API key, no signup, nothing for an operator to do — which is what zero-touch tier-4 actually means. Changes: - tools/auto_fetch_samples.py: rewrite. Clones theZoo (shallow, ~500 MB) to /var/lib/cis490/theZoo on first run, then for each manifest family without a sha256 it locates a matching Binaries/<Name> dir, extracts the .zip with password 'infected', picks the largest non-text payload as the binary, sha256s it, stages at samples/store/<sha256>, and rewrites manifest.toml in place (atomic tempfile + os.replace, stat preserved). Mandatory exit semantic: non-zero if no real samples landed. - scripts/install-tier-3-4.sh: dropped the MB-key resolution chain (env var → local file → bootstrap.wg fetch). Now just runs auto_fetch_samples.py and dies if zero samples land. SKIP_TIER4 remains as the explicit override but is documented as defeating the project. - bootstrap/app.py + __main__.py + etc/cis490-bootstrap.service: removed the /v1/secret/<name> endpoint and the --secrets-root flag. Dead code now that no API key needs distributing. Live-rolled back on the Pi (404 verified post-restart, stale /etc/cis490/secrets dir removed). - scripts/set-malwarebazaar-key.sh: deleted. No MB key means no one-time operator step. - tests/test_bootstrap_secrets.py: deleted (route removed). - AGENTS.md: rewrote tier-4 section to reflect zero-operator model. 148/148 tests pass. Bootstrap service rolled back live.	2026-05-01 01:17:50 -05:00
max	5d0e8e33a9	Tier 4 is mandatory: hard-fail on no real samples; auto-distribute MB key User: 'we don't want it to be optional, this real malware IS the data we want.' Acknowledged. Three changes make Tier 4 actually mandatory without forcing per-host operator action: 1. bootstrap.wg /v1/secret/<name> endpoint - Pi serves /etc/cis490/secrets/malwarebazaar.token to lab hosts over the same trust boundary as the cert endpoint (WG mesh, iptmonads-gated). Strict allow-list — only `malwarebazaar` resolves; everything else 404s. Secret returned as bare text with Cache-Control: no-store. Live-verified on the Pi. - tests/test_bootstrap_secrets.py covers four cases: 404 unprovisioned, 200 with token, 404 unknown name, 500 on empty file. 2. install-tier-3-4.sh: Tier 4 is no longer optional - Resolves MB key in priority: env var → /opt/cis490/samples/.bazaar.token → https://bootstrap.wg/v1/secret/malwarebazaar. - Caches the bootstrap-fetched key locally so re-runs are offline. - If all three resolution paths fail, dies with the exact remediation command for the operator (one-time set-malwarebazaar-key.sh on the Pi). - auto_fetch_samples.py is run unconditionally (SKIP_TIER4 still works for emergency overrides but logs a warning that the host will produce only mimics). Deploy fails if zero binaries land in samples/store/ — no silent mimic-only fallback. - SKIP_TIER4 documentation now says 'DEPRECATED; defeats the project'. 3. scripts/set-malwarebazaar-key.sh - Pi-side helper: one operator command per fleet, ever. Accepts key via env or stdin, validates length, drops at the right path with the right perms. Lab hosts pull the rest automatically. AGENTS.md: rewrote the Tier-4 section to reflect mandatory status + the one-time-on-Pi distribution model. 152/152 tests pass. Bootstrap service updated live on the Pi.	2026-05-01 00:44:41 -05:00
max	321ea63803	Multi-signal prune classifier: rescue valid episodes /proc misses A laptop-class lab host (elliott-thinkpad) running 14 parallel fleet slots can't deliver host /proc CPU% signal for the bursty profiles — the per-VM share gets buried under contention. But the workloads ARE running: qmp blockstats record 90+ MB written during infected_running for io-walk episodes, netflow shows real packet bursts for scan-and-dial, and the in-guest agent (when alive) shows load_1m deltas the host can't see. The classifier now cross-checks four sources before flagging an episode: - /proc CPU% medians (host-side qemu) - netflow byte totals (bridge_pcap) - qmp blockstats per-phase DELTA (cumulative counters; deltas matter, not raw values) - guest-agent load_1m An episode flags only if every available source agrees no inter-phase signal. Missing sources are "unknown", not "flat". Time-base bug also fixed: phase mapping now uses t_wall_ns (which all sources stamp from CLOCK_REALTIME) rather than t_mono_ns — netflow uses qemu boot-monotonic, /proc uses orchestrator-relative, they don't share a number line. Result on the live receiver: - 1067 active episodes, 100% kept under the new logic - 143 episodes rescued from a previous false-positive archive - Only the 9 genuinely-broken pre-Sample-propagation elliott-lab episodes remain archived (no-sample + no-workload-events) Two new tests (test_flat_proc_rescued_by_netflow, test_flat_everywhere_still_flags) pin the boundary so a future regression surfaces immediately. AGENTS.md gains a "classifier is multi-source" section explaining the cross-check and the t_wall_ns invariant.	2026-04-30 19:10:01 -05:00
max	2707709299	Fix workload-silent false-positive on Alpine busybox guests (closes #15 ) On-device agent (k-gamingcom) ran the diagnostic probe sequence and proved the workload IS running on Alpine — yes saturating the vCPU, loadavg=1.05, three yes PIDs visible — but two busybox incompatibilities made every episode look silent: 1. _probe() used `pgrep -c yes`. The -c flag is procps-ng/util-linux, not busybox. busybox pgrep exits 1 with a usage banner; the `\|\| echo 0` fallback then reported yes=0 every time. Switched to `pgrep yes \| wc -l` which both pgrep variants support. 2. _wrap_loop appended `disown` after the nohup-backgrounded script. busybox sh / ash have no disown builtin, so each infected_running phase printed `sh: disown: not found` into run()'s captured output. The script kept running (nohup gives SIGHUP immunity, which is what disown was for), but the spurious error is now gone. Cross-validation in the classifier: - prune_episodes.py: workload-silent now requires the probe AND host-side /proc CPU envelope (flat-cpu) to AGREE. A probe-only zero is treated as the busybox false-positive and dropped. This means the 244 already-on-disk episodes from elliott-thinkpad and k-gamingcom are correctly classified without re-collecting. Test coverage: - test_workload_silent_flag updated to require both signals - test_workload_silent_suppressed_when_host_cpu_real new regression for the busybox false-positive AGENTS.md gains a "Don't trust the in-guest probe alone" section with the busybox-vs-procps gotcha + a list of busybox-incompatible patterns to avoid in any new in-guest diagnostic.	2026-04-30 17:28:48 -05:00
max	8d2d0d2e99	prune+receiver: preserve index ownership and add a backfill helper (closes #13 ) Root cause of #13 (PUT 500s on first ship, retries return already-present): my earlier prune-tool session ran as root and rewrote the live index via os.replace(), which drops the original ownership/mode. The new file was root:root and the cis490 service user couldn't append to it. Every fresh PUT 500'd on _append_index after the tarball had already landed via os.replace, so retries always saw "already-present" and never recovered the missing index row. Two fixes: - tools/prune_episodes.py: snapshot the index's stat before the rename and restore uid/gid/mode after. Best-effort chown so non-root prune runs (where chown would EPERM) still succeed; non-root callers matched the original owner anyway. - tools/index_backfill.py: new tool. Walks episodes/<host>/*.tar.zst, computes sha256+size, and appends rows for episodes missing from the index. Preserves "backfilled: true" so trainers can distinguish reconstructed rows. Always opens the index in append mode (never replaces), so it cannot reproduce the ownership bug it's recovering from. Regression test: tests/test_prune.py::test_archive_preserves_index_mode. Operator note for the live receiver: ran the chown fix manually (chown cis490:cis490 /var/lib/cis490/index.jsonl) and ran the backfill once to recover 140 elliott-thinkpad rows that 500'd before the chown landed.	2026-04-30 16:36:05 -05:00
max	86a088c204	shipper: defer SSL context build until cert/CA paths exist (closes #11 ) First-boot bring-up enables cis490-shipper before the Pi has issued the mTLS leaf, so ssl.create_default_context(cafile=...) raised FileNotFoundError out of __init__ and systemd crash-looped the unit every RestartSec=5. Now the transport pre-flights the configured ca_bundle / client_cert / client_key paths, raises a recoverable _CertNotReadyError, and ping/ship_tarball retry the build on each request — daemon self-heals once the cert lands without a restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:13:59 -05:00
max	a61fa05980	cis490-prune: retroactively filter low-quality episodes from the dataset Without a prune step, every fix we land before elliott-lab pulls leaves a residue of pre-fix episodes in /var/lib/cis490/episodes/. Trainers either filter at training time (processing the bad data anyway) or — worse — train on it. This tool walks the receiver's index, classifies each episode against five quality signals, and either prints a dry-run summary, archives flagged episodes to /var/lib/cis490/episodes-archive/, or deletes them outright (with the index rewritten atomically). Quality signals (each independent; a bad episode can hit several): no-sample meta.sample is null. Pre-Sample-propagation code ran the v1 yes-loop fallback regardless of fleet selection, so the post-infection family isn't recorded. no-workload-events events.jsonl has zero workload_* rows. Pre-audit- trail code (before VMLoadController emits) — we can't tell whether the workload actually fired. workload-failed events.jsonl contains workload_failed. SerialClient raised mid-phase; labels and telemetry don't match what the orchestrator was supposed to be doing. workload-silent workload_killed event during dormant has pre_kill_probe.yes == "0". The schedule walked but the in-guest workload never started — the elliott-lab fingerprint. flat-cpu /proc CPU% medians spread <5pp across phases. A model can't learn to distinguish phases from this; pure noise to the trainer. CLI: cis490-prune # dry-run summary cis490-prune --reason no-sample # restrict to one signal (repeatable) cis490-prune --host elliott-lab # scope to one lab host cis490-prune --archive # mv flagged → episodes-archive/ cis490-prune --delete # rm flagged + drop index rows cis490-prune --json # machine-readable Index rewrite is atomic: tempfile + os.replace, so a crash mid-write leaves the live index intact. Tests: 143 (was 132). New cases (tests/test_prune.py): - one healthy synthetic episode produces zero reasons - five tests covering each individual reason flag - dry-run leaves disk + index untouched - --archive moves tarballs and rewrites index - --delete removes tarballs and rewrites index - --host filter scopes correctly (no-match → exit 0) - multi-reason episodes report all matching reasons Live state when this commit lands: 9 elliott-lab episodes from the pre-fix code path, all flagged. Operator can clear them with one command before elliott-lab re-ships under main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:41:10 -05:00
max	507eac617b	Solvable Tier-3 holes: callback payloads, busybox workloads, bridge by default Closes the next batch of issues from the post-mortem. The previous "each run uses a different vulnerability" commit shipped 5 modules but 3 of them couldn't actually fire under SLIRP+restrict=on: their reverse-shell payloads needed a callback channel the launcher didn't provide, AND their LHOST options were set to {{ target_ip }} (the target's IP, not the attacker's — copy-paste from RHOSTS). Same time, the workloads.py shell commands used bash-only /dev/tcp redirects that silently no-op'd in the busybox shell sessions Metasploitable2 returns. Net effect: episodes that selected those modules would have produced session_open_timeout + dead workloads. Module configs (the three callback ones): exploits/modules/distccd_command_exec.toml exploits/modules/php_cgi_arg_injection.toml exploits/modules/unreal_ircd_3281_backdoor.toml - Switch payload from cmd/unix/reverse* to cmd/unix/bind_perl so the target listens on a known port; msfrpcd connects to it via the host's hostfwd (no callback path required). - Drop the bogus LHOST = "{{ target_ip }}" — bind shells don't use LHOST. - Add [runtime] table: requires_bridge = true extra_target_ports = [<bind_lport>] Both fields are honored by the loader (ModuleConfig.requires_bridge) and the launcher (TARGET_PORTS gets the extra port hostfwd'd when BRIDGE mode is active). orchestrator/fleet.py When BRIDGE is unset in env, _run_slot filters the module catalog down to modules where requires_bridge=False before calling select_module. Two same-socket-shell modules (vsftpd_234_backdoor + samba_usermap_script) survive — fleet still has variety; just doesn't pick modules whose payloads can't land. With BRIDGE set, the full catalog rotates as before, AND BRIDGE is propagated to the per-slot subprocess env so launch_target.sh enters tap+bridge mode. exploits/workloads.py Replaced bash-only constructs in three profiles: scan-and-dial /dev/tcp/HOST/PORT redirects → nc -z -w 1 bursty-c2 same fix shell-resident exec 3<>/dev/tcp/... → piping into nc -w All three now run cleanly in busybox / dash / Metasploitable2's default shell. The remaining three profiles (cpu-saturate, io-walk, low-and-slow) were already busybox-portable. scripts/install-lab-host.sh - lab-host.env now defaults BRIDGE=br-malware (was commented out). Operator opt-out is to comment the line back in. - New step 6b: provisions br-malware via vm/setup_bridge.sh AND pre-creates a per-slot tap pool (cis490tap0..7 for Tier-2 demo, cis490target0..7 for Tier-3 target) all attached to br-malware and brought up. Launchers reference these by SLOT — no sudo needed at episode time. - On bridge-setup failure, the script auto-comments BRIDGE in the env file with a "auto-disabled: bridge setup failed" note so the fleet falls back to same-socket modules + Tier-2 cleanly. tools/cis490_doctor.py Two new checks for the lab-host role: bridge: br-malware exists / up tier3: msfrpcd listening on 127.0.0.1:55553 tier3: module catalog parses (counts same-socket vs requires_bridge) All three are warn-level — they don't fail an otherwise-healthy Tier-2-only setup; they tell the operator what's missing for full Tier-3 + source 4 coverage. Tests: 132 (was 129). New cases: test_fleet.py +3 - fleet skips requires_bridge modules when BRIDGE unset (asserted across 20 episodes; never picks a callback module) - fleet uses the full catalog when BRIDGE is set - BRIDGE env propagates to per-slot subprocess What's still untested live: the bind_perl payloads against a real Metasploitable2 in the bridge-enabled launcher path. That's a deployment validation, not a code change. The unit tests confirm the dispatch / filter logic; the live test is the next operator action. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:32:52 -05:00
max	a193d17ead	fleet: rotate exploit modules per (host, slot, ep); Tier 3 by default Closes the "every run hits the same vulnerability" gap. Before this commit, the fleet shipped Tier-2 episodes (no exploit at all) with only the post-infection sample varying. Tier-3 had a single canned module — vsftpd_234_backdoor — so even when exploit fire was exercised, the entry vector never changed. Trainer would see one shape of `armed → infecting` and learn nothing about how varied real exploits look on the wire / in /proc. What landed: exploits/modules/ + samba_usermap_script.toml CVE-2007-2447, SMB:139 + distccd_command_exec.toml CVE-2004-2687, distcc:3632 + php_cgi_arg_injection.toml CVE-2012-1823, http:80 + unreal_ircd_3281_backdoor.toml CVE-2010-2075, ircd:6667 (vsftpd_234_backdoor.toml unchanged) All five are canonical Metasploitable2 vectors with stable Metasploit modules. Each TOML carries the RPORT the launcher needs to wire its hostfwd at, plus a payload tuned to a clean shell session (cmd/unix/interact for in-band shells, cmd/unix/reverse* with deterministic LPORTs for reverse shells). exploits/modules.py + select_module(catalog, host_id, slot, episode_index) — same SHA-256-keyed deterministic selection shape SampleManifest uses for samples. Two hosts at the same slot/episode hash to different modules; one host walks the full catalog within ~len(catalog) episodes. + module_target_port() — pulls RPORT off the module config so the fleet can plumb the launcher's hostfwd at the right service. orchestrator/fleet.py - _run_slot now decides Tier 3 vs Tier 2 from msfrpcd reachability + module-catalog populated. Default is Tier 3 when both are true; Tier 2 fallback when not (logged + recorded in SlotResult.tier so trainers can filter no-exploit episodes). - Per-slot module via select_module() — each concurrent slot in a wave gets a different vector AND a different sample. - PORT_BASE per slot (target_port + slot * 1000) so concurrent Tier-3 targets don't collide on the host-side hostfwd port. - _msfrpcd_available() probe gates the dispatch. - Fleet-side log line records (slot, ep, tier, sample, module, run_dir) so the operator can see at a glance what each wave is exercising. - SlotResult grows tier + module_name fields; FleetConfig grows modules + force_tier2 + msfrpcd_{host,port} fields. orchestrator/episode.py + EpisodeConfig.exploit_meta — plain dict the runner stamps into meta.exploit so every Tier-3 episode records {framework, module path, module type, payload, RPORT, RHOSTS template}. Trainers join on meta.exploit.module_name to stratify by entry vector; meta.sample.name to stratify by post-infection family. tools/run_tier3_demo.py + Builds exploit_meta from the loaded ModuleConfig and passes it to EpisodeConfig. Sample is now also passed (was missing). tools/run_fleet.py + --modules-dir (default exploits/modules/) — load module catalog on startup; pass to FleetConfig. + --force-tier2 — escape hatch for dev / smoke tests. + JSON output now includes per-slot {tier, module} so the operator can see at a glance what each slot ran without grepping logs. Tests: 129 (was 119). New cases: test_exploits.py +6 - catalog has at least the five canonical Metasploitable2 vectors - select_module is deterministic per (host, slot, ep) - select_module diversifies across hosts - select_module walks the full catalog over many episodes - module_target_port pulls RPORT for each shipped TOML test_fleet.py +4 - _run_slot dispatches to run_tier3_demo.py when msfrpcd up - falls back to run_real_vm_demo.py when msfrpcd unreachable - falls back when module catalog empty - --force-tier2 overrides msfrpcd availability - PORT_BASE is unique per concurrent slot (no hostfwd collision) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:22:49 -05:00
max	d86502d950	workload audit trail: meta.sample + per-phase events + pre-kill probe The elliott-lab episode showed every phase median'd 20% CPU because the in-guest workload silently never fired — and there was no signal in events.jsonl to detect that from outside, so a trainer would treat the labels as ground truth and learn "all phases look identical". This commit closes the audit gap so the failure is visible in meta: orchestrator/episode.py EpisodeConfig.sample: Sample \| None — the manifest entry that drove this episode's workload selection. Stamped into meta.sample as {name, family, category, profile, kind, sha256} so trainers can join cleanly without re-deriving from events. None means the v1 yes-loop fallback path ran (and the trainer should treat the episode with appropriate skepticism). tools/vm_load_controller.py VMLoadController gains an emit_event callable. Every phase now emits a workload_* event into the runner's events.jsonl: workload_setup login + initial cleanup OK workload_killed clean / dormant. Dormant carries a `pre_kill_probe` dict from inside the guest (`pgrep -c yes`, `pgrep -c sh`, /proc/loadavg) so the trainer can detect the elliott-lab failure mode where the workload never actually ran. workload_armed armed handshake fired workload_infecting dd urandom / payload write fired workload_started infected_running command sent workload_failed any of the above raised inside SerialClient (timeout, EOF, partial login). The runner would have silently swallowed the exception via its on_phase try/except; the audit row makes the failure detectable. Exceptions in shell calls surface as workload_failed events but do NOT propagate, matching the runner's existing on_phase contract. tools/run_real_vm_demo.py Wires the controller's emit_event to the runner's emit_event via a small forward-reference closure (controller is built before runner; runner.emit_event needs to be the sink). Sample also flows into EpisodeConfig.sample so meta.sample matches what the controller actually ran. Tests: 119 (was 106). New cases: tests/test_vm_load_controller.py (11 tests against a FakeSerial) - setup emits workload_setup - infected_running runs the v1 yes-loop AND emits workload_started - dormant probes BEFORE killing and stamps pre_kill_probe - dormant probe records "yes=0" (the elliott-lab fingerprint) - clean / armed / infecting all emit their respective events - serial.run() exception → workload_failed event, no propagation - sample-with-profile dispatches to exploits.workloads command (NOT the v1 yes-loop) - missing emit_event callback is a no-op (back-compat) tests/test_episode.py (2 new) - meta.sample carries name/family/category/profile/kind/sha256 when EpisodeConfig.sample is set - meta.sample stays null in the v1 fallback path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:12:34 -05:00
max	a88ac83db0	Close out the deployment-readiness gaps Wraps the gaps surfaced in the "what is not implemented" audit so the fleet really is shippable end-to-end. Verified live on the Pi: - cis490-shipper --ping → HTTP 200 through Caddy + mTLS via the new wg-pki client CA leaf - real episode dir → tar+zstd → PUT → HTTP 201 stored - re-ship same bytes → 200 (idempotent) - re-ship different bytes under same id → 409 (conflict) Changes: orchestrator/episode.py - EpisodeConfig.revert_at_start / revert_at_end (Tier 0+ snapshot/ revert per docs/architecture.md). When set + qmp_socket present, EpisodeRunner issues loadvm <snapshot_name> and emits snapshot_revert / snapshot_revert_failed events on the same monotonic clock as everything else. collectors/qmp.py - savevm() / loadvm() helpers using human-monitor-command, plus a test against the fake QMP server. exploits/workloads.py - chunked_real_binary_upload() returns a ChunkedUpload plan: 8 KiB base64 chunks (~6 KiB binary each) so msfrpc never sees a buffer- busting payload. Includes a finalize step that sha256-verifies on the guest before exec. - real_binary_workload() now wraps the chunked plan for backwards compat with single-shot callers. exploits/driver.py - Tier-4 dispatch walks the chunked plan in MSFExploitDriver: each chunk is a separate session_shell_write; finalize verifies; exec only runs on sha-ok. New events: real_binary_upload_begin, real_binary_verify, real_binary_aborted. etc/cis490-orchestrator.service - Reads /etc/cis490/lab-host.env (FLEET_HOST_ID + optional BRIDGE). - Grants AmbientCapabilities CAP_NET_RAW (tcpdump for source 4) + CAP_SYS_ADMIN + CAP_PERFMON (perf for source 3) so collectors work under hardening. scripts/install-lab-host.sh - Writes /etc/cis490/lab-host.env on first install with FLEET_HOST_ID defaulting to `hostname -s`. - Best-effort: fetches the Alpine baseline qcow2 (sha512-pinned) and builds cidata.iso with the in-guest agent embedded; symlinks both into /opt/cis490/vm/images/ so launchers find them. scripts/fetch-alpine-baseline.sh - Idempotent fetcher for the Alpine 3.21 cloud-init nocloud qcow2 matching the sha512 in docs/sources.md. tools/plot_envelope.py - Rebuilt to render whatever telemetry the episode dir contains: proc → QMP block ops → perf IPC/miss-rate → bridge pkts/SYNs → guest agent load/mem. Missing sources are silently skipped. tools/index_reader.py - cis490-index CLI: filter receiver's index.jsonl by host / sample / time range, sort, count-by group. Closest thing to a query interface until we stand up Postgres/Timescale. samples/README.md - Rewritten to match the new manifest schema, the kind=real vs mimic split, the per-(host, slot, ep) selection mechanic, and the chunked-upload safety story. Tests: 106 pass (was 102). New cases: - test_qmp.py — savevm + loadvm (HMP wrapper + error path) - test_tier4.py — chunked plan splitting, sha-pinned finalize, end-to-end driver walks all chunks + verify + exec via the fake msfrpc client Closes the "what is not implemented" punch list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:31:55 -05:00
max	bdcd2ecbef	Close out the open issues: bridge pcap wiring, perf collector, Tier-4 Wraps the three remaining 🚧 items from the README so every collector the threat-model promises is actually live, and the Tier-4 path (real-malware fetch + upload + exec) works end-to-end as soon as a sha256 lands in samples/store/. Closes spectral/CIS490#4, #5, #6. == #6 — Bridge pcap wiring == EpisodeConfig grows three optional fields: bridge_iface: str \| None # e.g. "br-malware" bridge_ip: str = "10.200.0.1" pcap_snaplen: int = 256 When bridge_iface is set, EpisodeRunner spawns tcpdump for the duration of the schedule (network.pcap), stops it cleanly on episode end, and runs collectors.pcap.bucketize() to produce netflow.jsonl per the 100-ms schema in docs/data-model.md. EpisodeResult + meta.result gain rows_netflow + pcap_bytes counters. vm/launch_demo.sh + launch_target.sh now switch between SLIRP usermode and tap+bridge based on $BRIDGE — operator pre-creates the tap as a bridge member, no sudo from the launcher. run_real_vm_demo.py picks BRIDGE up from env so the fleet runner can opt entire waves into pcap mode by exporting BRIDGE before invocation. == #5 — Source 3 perf collector == collectors/perf_qemu.py shells out to ``perf stat -p <pid> -I 100 -j`` and parses the per-event JSON stream. Aggregates one row per interval across the canonical event set (cycles/instructions/cache-{refs,misses}/ branches/branch-misses/page-faults/context-switches), computes IPC + cache-miss rate. Tolerates missing events (``<not counted>`` / ``<not supported>``) without dropping the row, and skips cleanly when ``perf`` isn't on PATH or the process can't be attached. EpisodeConfig.enable_perf=True opts into the collector — off by default because perf needs CAP_SYS_ADMIN or perf_event_paranoid <= 1. When enabled, runs as a parallel thread alongside the other collectors; EpisodeResult.rows_perf records the count. == #4 — Tier 4 (real-malware fetch + upload + exec) == tools/fetch_sample.py: pulls a sample by sha256 from MalwareBazaar (API key from env or samples/.bazaar.token), unzips with the standard "infected" password, verifies the resulting binary's sha256, lands at samples/store/<sha256>. Idempotent — already-staged correct binaries return immediately. samples/manifest.py: Sample.binary_path(store_root) resolves to the staged binary path, or None for mimics / not-yet-fetched real samples. exploits/workloads.py: real_binary_workload(bytes, sample) builds a Workload that base64-uploads the binary into the shell session via a heredoc, decodes + chmods + execs it in the background, captures the PID for clean stop on dormant. Per-profile pid/bin paths so concurrent samples in the same guest don't collide. exploits/driver.py: dispatch order is now: 1) sample.kind == "real" + binary staged at sample_store_root → real_binary_workload (Tier 4) 2) profile mimic from workloads.workload_for() (Tier 3 v2) 3) None → driver v1 fallback yes-loop DriverConfig.sample_store_root is the new field; run_tier3_demo.py wires it to repo_root/samples/store. driver_setup event records sample_sha256 so trainers can join Tier-4 episodes against the manifest by hash. samples/store/.gitkeep added (binaries themselves are gitignored). Tests: 102 pass (was 86). New suites: tests/test_perf_qemu.py — parser + builder + perf-missing fallback tests/test_tier4.py — real_binary_workload base64 round-trip, stop-cmd kills pidfile, per-profile path isolation, driver dispatch chooses real vs mimic correctly, fetcher input validation and cached-fast-path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:17:49 -05:00
max	b80986d99c	Driver v2: sample-profile-driven workloads (Tier-2 + Tier-3) The v1 driver ran ``yes > /dev/null`` for every sample, which produced the same envelope shape regardless of which malware family the orchestrator claimed to be running. That's a poor training signal: the model sees identical /proc + QMP traces tagged "cryptominer" / "ransomware" / "RAT" with no distinguishing features. v2 fixes this. What landed: exploits/workloads.py — six ``Workload`` profiles, each producing a distinct in-session shell command pair (start_cmd / stop_cmd) that backgrounds a profile-shaped loop: cpu-saturate — sustained 1-vCPU saturation (XMRig shape) scan-and-dial — periodic SYN-style probes across 10.200.0.0/24 + dial-home to gateway (Mirai shape) io-walk — fs traversal + 4 KiB urandom writes, periodic re-read (ransomware shape) bursty-c2 — long idle, periodic 3-packet TCP egress burst (Dridex C2 beacon shape) low-and-slow — minimal CPU + periodic awk-driven memory churn (Kovter / fileless shape) shell-resident — single long-lived TCP socket pinned to gateway with periodic 6-byte command ticks (RAT shape) Each profile uses a /tmp/.cis490-workload-<profile>.{pid,sh} pair so the stop_cmd can cleanly kill the loop and its descendants. exploits/driver.py — MSFExploitDriver now accepts an optional ``Sample``. With one supplied, ``infected_running`` dispatches to the matching workload via exploits.workloads.workload_for(); the ``sample_executed`` event records profile + sample name + sample kind so the trainer can join cleanly. Without a sample, the v1 yes-loop path remains unchanged (backwards compat). tools/vm_load_controller.py — the same dispatch on the Tier-2 path (no exploit, real Alpine guest driven over the serial console). A fleet wave now produces six visually distinct envelopes per wave whether the underlying mode is Tier 2 or Tier 3. tools/run_real_vm_demo.py — accepts ``--sample <name>`` (or SAMPLE_NAME env from the fleet runner) + auto-wires QMP + agent sockets into the EpisodeConfig so all three new collectors (sources 2, 4, 5) run alongside source 1 by default. tools/run_tier3_demo.py — same ``--sample`` plumbing for the exploit-driven path. Tests: 86 pass (was 82). New v2 cases: - profile dispatch routes infected_running to the workload's start_cmd (NOT the v1 yes-loop) when a Sample is set - all six profiles produce distinct start_cmds (the property the ML model needs) - unknown profile string falls back to cpu-saturate with a warning - v1 path (no Sample) still uses yes-loop (backwards compat) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:06:15 -05:00
max	1b6c7b2f4a	Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts This is the chunk that makes "real data" actually flow on multiple hosts in parallel. End-to-end pipe was up at `613c6fa` / 2579683; now the lab-host side has the diversity + concurrency it needs. Collectors landed: collectors/qmp.py — source 2 (oracle). Tiny synchronous QMP client + row builder + run loop. Tolerates older qemu without query-stats. collectors/guest_agent.py — source 5 (deployable). Reads the virtio-serial host-side socket, parses agent JSON-lines, re-stamps to the host monotonic clock, persists. collectors/pcap.py — source 4 (deployable). tcpdump capture + pure-Python pcap reader + 100 ms netflow.jsonl bucketizer. Decodes Ethernet/IPv4/TCP/UDP enough for the schema in docs/data-model.md. In-guest agent: vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads /proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs, thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent. tools/build_cidata.py — embeds the agent + an OpenRC service into user-data so first boot of the Alpine cidata image auto-starts it. Launchers: vm/launch_demo.sh / launch_target.sh — second virtio-serial port for the agent socket; SLOT env support so multiple VMs run without socket / port collisions; PORT_BASE on launch_target so multiple target VMs hostfwd different host ports. vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24, no NAT). Idempotent. Fleet: orchestrator/fleet.py — capacity detector (cores / RAM / load headroom) + concurrent-slot runner. Per-slot ENV selects the sample. FleetCapacity dataclass round-trips into meta.json so "this episode ran with 6 concurrent VMs" is auditable post-hoc. tools/run_fleet.py — CLI: --capacity report; --waves N runs N waves of (max_concurrent) episodes each, every slot with a different sample. etc/cis490-orchestrator.service — now drives the fleet runner with Restart=always so each invocation runs one wave and respawns, giving a continuous stream. Samples: samples/manifest.toml — six profiles spanning the five major behaviour shapes. Each entry is real OR mimic (sha256 distinguishes). samples/manifest.py — strict TOML loader (rejects dups, unknown categories) + deterministic select(host_id, slot, episode_index) so different hosts on the network walk the catalog in different orders without any coordinator. EpisodeRunner: orchestrator/episode.py — optional qmp_socket + guest_agent_socket fields on EpisodeConfig; when set, additional collector threads run alongside proc_qemu. EpisodeResult now carries rows_qmp + rows_guest counters. Tier-3 setup automation: scripts/install-msfrpcd.sh — installs metasploit-framework where the package manager has it, generates a strong password into /etc/cis490/msfrpc.env, drops a hardened systemd unit bound to 127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch once MSFRPC_PASSWORD is sourced. scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256 from the operator (Rapid7 download is registration-walled), pulls, verifies, converts vmdk → qcow2, lands at vm/images/. Tests: 82 pass (was 51). New suites: tests/test_qmp.py — fake QMP server, capability handshake, blockstats, async-event interleaving, 5-failure backoff tests/test_guest_agent.py — fake virtio socket, JSON-lines read + re-stamp, malformed-line tolerance tests/test_pcap.py — synthetic pcap with TCP/UDP/ARP frames, bucketize correctness across windows tests/test_fleet.py — capacity math (8-core idle / low-RAM / high-load / Pi5 / 1-core box), manifest selection determinism + diversity What's queued for the next commit (already discussed in convo): - MSFExploitDriver v2: map sample.profile → distinct in-session workload so Tier-3 episodes don't all produce the same yes-loop envelope. Critical for ML to learn varied malware shapes. - Real-sample fetch from MalwareBazaar by sha256. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:02:27 -05:00
max	7c9f9582ca	Lab-host shipper + receiver /v1/ping + install scripts Implements the deployment loop end-to-end on the CIS490 side: shipper/ config.py ShipperConfig (host_id, paths, receiver endpoint, mTLS) transport.py httpx-based PUT + ping with mTLS + bearer support queue.py scan data/episodes/, tar+zstd via system zstd, ship, retire to data/shipped/. Idempotent across crashes per the state machine in docs/transport.md. __main__.py CLI: --ping (smoke test), --once (one pass), or daemon receiver/app.py: new POST /v1/ping that requires the same auth as PUT /v1/episodes but writes nothing. Used by `cis490-shipper --ping` during lab-host bring-up to verify the WG/Caddy/mTLS path before shipping any real bytes. etc/ cis490-shipper.service systemd unit for the lab-host shipper cis490-orchestrator.service systemd unit for the lab-host queue (kept disabled by default until queue mode lands) lab-host.toml.example config template scripts/ install-lab-host.sh idempotent installer; verifies prereqs, creates cis490 service user, syncs repo to /opt/cis490, builds venv, drops systemd units and config template install-receiver.sh same, for the receiver role on the central WG node (Pi5 in our setup) tests/test_shipper.py 11 end-to-end tests against a real Uvicorn server hosting the receiver app. Exercises ping, tar+ship, idempotent re-ship, 409 conflict, transient (receiver down), tarball round-trip via system zstd. AGENTS.md guidance for AI agents working on this and sibling repos. Headline: when you hit an issue you can't fully fix in scope, file a Forgejo issue rather than leaving a TODO. 51/51 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:41:32 -05:00
max	613c6fa223	Tier 3: msfrpc-driven exploit driver + first module config Adds the Tier-3 exploit driver — an MSFExploitDriver that plugs into EpisodeRunner.on_phase, fires a Metasploit module against a target VM via msfrpcd, watches for the resulting session, and stamps each transition (exploit_fire, session_open, session_landing_probe, sample_executed, session_dormant, session_killed) into the episode's events.jsonl on the orchestrator's monotonic clock. What landed: - exploits/msfrpc.py — minimal msgpack-over-HTTPS client (auth, module.execute, job/session lifecycle) so we don't depend on a third-party MSF wrapper. - exploits/driver.py — phase-to-msfrpc adapter; idempotent fire, session-open polling with timeout, workload start/stop, teardown. - exploits/modules.py + exploits/modules/vsftpd_234_backdoor.toml — TOML module configs with {{ target_ip }} placeholders, replacing the imperative .rc-script approach the README previously hinted at. - vm/launch_target.sh — SLIRP+restrict=on launcher for the intentionally-vulnerable target VM (host can reach guest via hostfwd, guest cannot reach host or internet). - tools/run_tier3_demo.py — end-to-end runner mirroring run_real_vm_demo. - tests/test_exploits.py — 12 new tests against a fake MSFRpcClient, including an integration test that drives a real EpisodeRunner. Plumbing changes: - EpisodeRunner._emit_event → public emit_event, so external drivers share the runner's monotonic clock and events.jsonl. - mkdir for episode_dir moved to __init__ so emit_event is callable before run() (driver_setup fires pre-schedule). Status: driver + tests pass (40/40); end-to-end against a live msfrpcd + Metasploitable2 image is the next bring-up step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:11:52 -05:00
Maximus Gorog	064387b7a0	Add v0 orchestrator + first oracle collector (host /proc) End-to-end: ``python -m orchestrator --target-pid <pid> --duration N`` now writes a complete episode directory matching docs/data-model.md, with phase labels, events, and a 10 Hz host /proc telemetry stream. No VM yet — pid is arbitrary so we can validate the loop against e.g. ``sleep 5`` while the lab side comes up. collectors/proc_qemu.py — parses /proc/<pid>/{stat,io,status} (handles parens in comm), single-shot collect_once(), and a stop-event-driven run_loop() that ticks at a fixed cadence and exits when the pid disappears. Tagged ``available_in_deployment: false`` per the threat-model doc. orchestrator/episode.py — EpisodeRunner: creates data/episodes/<ulid>/, atomic meta.json, events.jsonl + labels.jsonl writers, drives the collector in a thread for duration_s, writes done.marker last so the shipper never sees a half-finished episode. orchestrator/ulid.py — tiny 26-char Crockford-base32 ULID generator. Time-sortable, no third-party dep. orchestrator/__main__.py — CLI entry point. Tests (15 new, 28 total green): - proc_qemu: real-ish stat with parens-in-comm, missing /proc/<pid>/io, missing pid, run_loop cadence, run_loop terminates when pid disappears. - episode: full directory shape against os.getpid(), id override, done.marker written after meta.json finalize. - ulid: length+alphabet, 2000-burst uniqueness, time-sortability. Smoke-tested against ``sleep 10``: 16 rows over 1.5s at 100ms cadence, monotonic clock, RSS stable at ~3.5 MiB as expected for an idle sleep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:40:25 -06:00
Maximus Gorog	83e111961d	Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency Implements docs/transport.md as a small Starlette app. The receiver streams episode tarballs to disk, verifies sha256 against an X-Content-SHA256 header, atomically renames into the store on success, and appends one row to a flat index.jsonl. No DB. Idempotent re-PUTs return 200; conflicting bodies return 409. Optional bearer-token auth (mTLS terminates at Caddy in prod). receiver/ store.py EpisodeStore: sha-verifying streaming ingest, atomic rename, append-only index. No HTTP. app.py make_app(): Starlette routes + bearer guard. config.py ReceiverConfig.load(): TOML parser. __main__.py uvicorn entrypoint, reads --config TOML. tests/test_receiver.py — 13 tests via httpx.ASGITransport. Covers: 201 new, 200 idempotent replay, 409 conflict, 400 sha mismatch + cleanup, 400 missing/ short header, 400 bad id, 400 bad suffix, 413 too large, 401 bearer enforcement, schema-version pass-through. etc/cis490-receiver.service — systemd unit with hardening flags. etc/receiver.toml.example — config template matching docs/deploy.md. End-to-end smoke-tested with curl: 201 → 200 → 409 path verified, file on disk, single index row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:34:04 -06:00

38 commits