CIS490

Author	SHA1	Message	Date
Elliott Kolden	b29d30a1b2	Tier-3: fix QEMU boot, catalog admission, verify module Bug 14 (vm/launch_target.sh): Metasploitable2 requires -machine pc (i440fx), -cpu kvm32, -drive if=ide, and -device e1000. The previous config (-machine q35, -cpu host, -drive if=virtio, virtio-net-pci) caused a kernel panic at boot because /dev/vda != the grub root=/dev/sda1. Services never started; the b'' probe fix (Bug 10) then correctly waited out the full timeout with no result. Bug 15 (scripts/install-tier-3-4.sh): verify step used vsftpd_234_backdoor which is requires_bridge=true and has a hardcoded port-6200 backdoor. Changed to distccd_command_exec with TARGET_PORTS="5632:3632,4444:4444". manifest.toml: admit distccd_command_exec and unreal_ircd_3281_backdoor to the module catalog. Both use cmd/unix/bind_perl (bind shell, no guest egress, SLIRP-safe). distccd returns a valid protocol response so MSF's handler runs and session_open fires. Verified against Metasploitable2 sourceforge image sha256 a8c019c3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 16:41:41 -06:00
Elliott Kolden	667f042707	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01) Root causes and fixes documented in TIER3-BRINGUP.md. Summary: 1. BRIDGE env var leaked into Tier-3 subprocess → target VM used tap instead of SLIRP; fix: env.pop("BRIDGE") in fleet _run_slot. 2. usable_modules filter conditioned on BRIDGE presence → bridge-requiring modules selected on SLIRP runs; fix: always filter requires_bridge. 3. cmd/unix/interact creates no session.list entry → session_open_timeout every episode; fix: switch samba_usermap_script to cmd/unix/bind_perl. 4. Per-slot LPORT hostfwd used wrong guest port (host:5444→guest:4444); fix: extra_host_port:extra_host_port mapping so guest binds the per-slot LPORT directly. 5. vsftpd backdoor port 6200 hardcoded → collision across concurrent slots; fix: requires_bridge=true filters it from SLIRP fleet runs. 6. SLIRP false-positive in _wait_for_tcp → exploit fires before Samba boots (~60 s too early); fix: replace TCP probe with serial console _wait_for_serial_login that waits for actual "login:" prompt. 7. Stale QEMU survives orchestrator restart (start_new_session=True) → holds hostfwd ports, new QEMU silently fails; fix: kill by pgid from old pidfile before rmtree. 8. PORT_BASE default used privileged port 21; fix: default to 2021+slot*100. 9. msfrpcd 6.x returns bytes for all string values even with raw=False; fix: MSFRpcClient._str() recursive decoder applied to all responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:26:19 -06:00
max	eda6164897	fix: lab-host install loop after commit-gate cutover Why services weren't starting after the gate went live: 1. install-lab-host.sh self-copy. The receiver's 400 remediation tells the agent to `cd /opt/cis490 && git pull && sudo ./scripts/install-lab-host.sh`. That makes REPO_ROOT==INSTALL_ROOT and `cp -aT $REPO_ROOT $INSTALL_ROOT` errors with "are the same file"; `set -e` aborts before the systemd units install or anything restarts. Detect the same-dir case and skip the cp; chown still runs. 2. Services never restart. install-lab-host.sh and install-tier-3-4.sh both ended by telling the operator to restart, then exiting. The running shipper/orchestrator kept executing pre-gate code from the old module objects, so new `code_version` stamping never reached an episode. Both scripts now `systemctl restart` the units they own when those units are enabled. 3. Shipper queue fatal-loop. queue.py incremented `fatal++` but didn't move the episode out of `data/episodes/`. Next scan re-tarred and re-PUT the same dir, getting 400 again. With 4465+ pre-stamp episodes on k-gamingcom this burned ~1 PUT/sec for 5+ hours of receiver log. Fatal episodes now move to data/quarantine/<id>/ with a quarantine_reason.json beside them; the outbox tarball is deleted. 4. Pre-stamp backlog drain. tools/quarantine_unstamped.py is a one-shot that scans data/episodes/ and quarantines anything without a 40-char-hex code_version.commit. Wired into install-lab-host.sh step 9 so a re-install drains the queue automatically. Idempotent; safe to run while the shipper is active. Tests cover the queue's new fatal-quarantine path and every drain behaviour (kept/quarantined/dry-run/idempotent/missing-meta/collision). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:36:21 -05:00
max	265f3ad313	Tier-4 sample source: theZoo (no auth, no operator action) Replaces MalwareBazaar with theZoo (https://github.com/ytisf/theZoo). theZoo is a public security-research repo with hundreds of malware samples organized by family, password-protected with the well-known 'infected'. No API key, no signup, nothing for an operator to do — which is what zero-touch tier-4 actually means. Changes: - tools/auto_fetch_samples.py: rewrite. Clones theZoo (shallow, ~500 MB) to /var/lib/cis490/theZoo on first run, then for each manifest family without a sha256 it locates a matching Binaries/<Name> dir, extracts the .zip with password 'infected', picks the largest non-text payload as the binary, sha256s it, stages at samples/store/<sha256>, and rewrites manifest.toml in place (atomic tempfile + os.replace, stat preserved). Mandatory exit semantic: non-zero if no real samples landed. - scripts/install-tier-3-4.sh: dropped the MB-key resolution chain (env var → local file → bootstrap.wg fetch). Now just runs auto_fetch_samples.py and dies if zero samples land. SKIP_TIER4 remains as the explicit override but is documented as defeating the project. - bootstrap/app.py + __main__.py + etc/cis490-bootstrap.service: removed the /v1/secret/<name> endpoint and the --secrets-root flag. Dead code now that no API key needs distributing. Live-rolled back on the Pi (404 verified post-restart, stale /etc/cis490/secrets dir removed). - scripts/set-malwarebazaar-key.sh: deleted. No MB key means no one-time operator step. - tests/test_bootstrap_secrets.py: deleted (route removed). - AGENTS.md: rewrote tier-4 section to reflect zero-operator model. 148/148 tests pass. Bootstrap service rolled back live.	2026-05-01 01:17:50 -05:00
max	5d0e8e33a9	Tier 4 is mandatory: hard-fail on no real samples; auto-distribute MB key User: 'we don't want it to be optional, this real malware IS the data we want.' Acknowledged. Three changes make Tier 4 actually mandatory without forcing per-host operator action: 1. bootstrap.wg /v1/secret/<name> endpoint - Pi serves /etc/cis490/secrets/malwarebazaar.token to lab hosts over the same trust boundary as the cert endpoint (WG mesh, iptmonads-gated). Strict allow-list — only `malwarebazaar` resolves; everything else 404s. Secret returned as bare text with Cache-Control: no-store. Live-verified on the Pi. - tests/test_bootstrap_secrets.py covers four cases: 404 unprovisioned, 200 with token, 404 unknown name, 500 on empty file. 2. install-tier-3-4.sh: Tier 4 is no longer optional - Resolves MB key in priority: env var → /opt/cis490/samples/.bazaar.token → https://bootstrap.wg/v1/secret/malwarebazaar. - Caches the bootstrap-fetched key locally so re-runs are offline. - If all three resolution paths fail, dies with the exact remediation command for the operator (one-time set-malwarebazaar-key.sh on the Pi). - auto_fetch_samples.py is run unconditionally (SKIP_TIER4 still works for emergency overrides but logs a warning that the host will produce only mimics). Deploy fails if zero binaries land in samples/store/ — no silent mimic-only fallback. - SKIP_TIER4 documentation now says 'DEPRECATED; defeats the project'. 3. scripts/set-malwarebazaar-key.sh - Pi-side helper: one operator command per fleet, ever. Accepts key via env or stdin, validates length, drops at the right path with the right perms. Lab hosts pull the rest automatically. AGENTS.md: rewrote the Tier-4 section to reflect mandatory status + the one-time-on-Pi distribution model. 152/152 tests pass. Bootstrap service updated live on the Pi.	2026-05-01 00:44:41 -05:00
max	683bfe9ce6	Tier 3 + Tier 4 auto-deploy: zero operator interaction Replaces the manual runbook with scripts that just work. install-lab-host.sh now runs the full Tier-3 deploy automatically as its 8th step (after the mTLS cert lands), and Tier-4 auto-fetches when MALWAREBAZAAR_API_KEY is set. Changes: - install-msfrpcd.sh: actually runs the Rapid7 omnibus installer when metasploit-framework isn't present (was: bail with "install manually"). apt-get and dnf paths both go through the same omnibus script with DEBIAN_FRONTEND=noninteractive. Idempotent. - fetch-metasploitable2.sh: bakes in the SourceForge public-mirror URL (https://downloads.sourceforge.net/project/metasploitable/...) so no operator URL is required. sha256 is now optional and TOFU-pinned — first run records the hash to OUT_DIR/metasploitable2.qcow2.sha256; subsequent runs verify against that. Skips if qcow2 already present. - scripts/install-tier-3-4.sh (new): orchestrates the four steps (msfrpcd → metasploitable2 → bridge → tier-3 verify) plus optional Tier-4 auto-fetch. Idempotent. SKIP_VERIFY / SKIP_BRIDGE / SKIP_TIER4 env knobs for partial deploys. - tools/auto_fetch_samples.py (new): when MALWAREBAZAAR_API_KEY is set, queries MB by each manifest entry's `family` (signature match), pulls the first match via fetch_sample.py, and rewrites manifest.toml in place (atomic tempfile + os.replace, preserving stat). Skips entries that already have sha256. - install-lab-host.sh: gains a step 8 that calls install-tier-3-4.sh automatically when mTLS certs are on disk. --skip-tier3 flag for operators who want Tier 2 only. Skipped silently before certs land so first-pass install (host_id=REPLACE_ME) still works. - AGENTS.md: rewrote the Tier-3 section to point at the one-shot script. Removed the old multi-command runbook so on-device agents can't accidentally follow stale steps. Net effect: a fresh lab host now gets Tier 3 (and Tier 4 if API key present) from a single sudo invocation. No operator picks for image URLs, no manual metasploit installs, no manual manifest edits.	2026-04-30 23:12:08 -05:00

6 commits