Commit graph

6 commits

Author SHA1 Message Date
b29d30a1b2 Tier-3: fix QEMU boot, catalog admission, verify module
Bug 14 (vm/launch_target.sh): Metasploitable2 requires -machine pc
(i440fx), -cpu kvm32, -drive if=ide, and -device e1000. The previous
config (-machine q35, -cpu host, -drive if=virtio, virtio-net-pci)
caused a kernel panic at boot because /dev/vda != the grub root=/dev/sda1.
Services never started; the b'' probe fix (Bug 10) then correctly waited
out the full timeout with no result.

Bug 15 (scripts/install-tier-3-4.sh): verify step used vsftpd_234_backdoor
which is requires_bridge=true and has a hardcoded port-6200 backdoor.
Changed to distccd_command_exec with TARGET_PORTS="5632:3632,4444:4444".

manifest.toml: admit distccd_command_exec and unreal_ircd_3281_backdoor
to the module catalog. Both use cmd/unix/bind_perl (bind shell, no guest
egress, SLIRP-safe). distccd returns a valid protocol response so MSF's
handler runs and session_open fires. Verified against Metasploitable2
sourceforge image sha256 a8c019c3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 16:41:41 -06:00
667f042707 Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)
Root causes and fixes documented in TIER3-BRINGUP.md. Summary:

1. BRIDGE env var leaked into Tier-3 subprocess → target VM used tap
   instead of SLIRP; fix: env.pop("BRIDGE") in fleet _run_slot.

2. usable_modules filter conditioned on BRIDGE presence → bridge-requiring
   modules selected on SLIRP runs; fix: always filter requires_bridge.

3. cmd/unix/interact creates no session.list entry → session_open_timeout
   every episode; fix: switch samba_usermap_script to cmd/unix/bind_perl.

4. Per-slot LPORT hostfwd used wrong guest port (host:5444→guest:4444);
   fix: extra_host_port:extra_host_port mapping so guest binds the
   per-slot LPORT directly.

5. vsftpd backdoor port 6200 hardcoded → collision across concurrent slots;
   fix: requires_bridge=true filters it from SLIRP fleet runs.

6. SLIRP false-positive in _wait_for_tcp → exploit fires before Samba
   boots (~60 s too early); fix: replace TCP probe with serial console
   _wait_for_serial_login that waits for actual "login:" prompt.

7. Stale QEMU survives orchestrator restart (start_new_session=True) →
   holds hostfwd ports, new QEMU silently fails; fix: kill by pgid from
   old pidfile before rmtree.

8. PORT_BASE default used privileged port 21; fix: default to 2021+slot*100.

9. msfrpcd 6.x returns bytes for all string values even with raw=False;
   fix: MSFRpcClient._str() recursive decoder applied to all responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:26:19 -06:00
max
eda6164897 fix: lab-host install loop after commit-gate cutover
Why services weren't starting after the gate went live:

1. install-lab-host.sh self-copy. The receiver's 400 remediation tells
   the agent to `cd /opt/cis490 && git pull && sudo
   ./scripts/install-lab-host.sh`. That makes REPO_ROOT==INSTALL_ROOT
   and `cp -aT $REPO_ROOT $INSTALL_ROOT` errors with "are the same
   file"; `set -e` aborts before the systemd units install or anything
   restarts. Detect the same-dir case and skip the cp; chown still
   runs.

2. Services never restart. install-lab-host.sh and install-tier-3-4.sh
   both ended by *telling the operator* to restart, then exiting. The
   running shipper/orchestrator kept executing pre-gate code from the
   old module objects, so new `code_version` stamping never reached an
   episode. Both scripts now `systemctl restart` the units they own
   when those units are enabled.

3. Shipper queue fatal-loop. queue.py incremented `fatal++` but didn't
   move the episode out of `data/episodes/`. Next scan re-tarred and
   re-PUT the same dir, getting 400 again. With 4465+ pre-stamp
   episodes on k-gamingcom this burned ~1 PUT/sec for 5+ hours of
   receiver log. Fatal episodes now move to data/quarantine/<id>/ with
   a quarantine_reason.json beside them; the outbox tarball is
   deleted.

4. Pre-stamp backlog drain. tools/quarantine_unstamped.py is a
   one-shot that scans data/episodes/ and quarantines anything without
   a 40-char-hex code_version.commit. Wired into install-lab-host.sh
   step 9 so a re-install drains the queue automatically. Idempotent;
   safe to run while the shipper is active.

Tests cover the queue's new fatal-quarantine path and every drain
behaviour (kept/quarantined/dry-run/idempotent/missing-meta/collision).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:36:21 -05:00
max
265f3ad313 Tier-4 sample source: theZoo (no auth, no operator action)
Replaces MalwareBazaar with theZoo (https://github.com/ytisf/theZoo).
theZoo is a public security-research repo with hundreds of malware
samples organized by family, password-protected with the well-known
'infected'. No API key, no signup, nothing for an operator to do —
which is what zero-touch tier-4 actually means.

Changes:

- tools/auto_fetch_samples.py: rewrite. Clones theZoo (shallow, ~500 MB)
  to /var/lib/cis490/theZoo on first run, then for each manifest
  family without a sha256 it locates a matching Binaries/<Name>
  dir, extracts the .zip with password 'infected', picks the largest
  non-text payload as the binary, sha256s it, stages at
  samples/store/<sha256>, and rewrites manifest.toml in place
  (atomic tempfile + os.replace, stat preserved). Mandatory exit
  semantic: non-zero if no real samples landed.

- scripts/install-tier-3-4.sh: dropped the MB-key resolution chain
  (env var → local file → bootstrap.wg fetch). Now just runs
  auto_fetch_samples.py and dies if zero samples land. SKIP_TIER4
  remains as the explicit override but is documented as defeating
  the project.

- bootstrap/app.py + __main__.py + etc/cis490-bootstrap.service:
  removed the /v1/secret/<name> endpoint and the --secrets-root flag.
  Dead code now that no API key needs distributing. Live-rolled
  back on the Pi (404 verified post-restart, stale /etc/cis490/secrets
  dir removed).

- scripts/set-malwarebazaar-key.sh: deleted. No MB key means no
  one-time operator step.

- tests/test_bootstrap_secrets.py: deleted (route removed).

- AGENTS.md: rewrote tier-4 section to reflect zero-operator model.

148/148 tests pass. Bootstrap service rolled back live.
2026-05-01 01:17:50 -05:00
max
5d0e8e33a9 Tier 4 is mandatory: hard-fail on no real samples; auto-distribute MB key
User: 'we don't want it to be optional, this real malware IS the data
we want.' Acknowledged. Three changes make Tier 4 actually mandatory
without forcing per-host operator action:

1. bootstrap.wg /v1/secret/<name> endpoint
   - Pi serves /etc/cis490/secrets/malwarebazaar.token to lab hosts
     over the same trust boundary as the cert endpoint (WG mesh,
     iptmonads-gated). Strict allow-list — only `malwarebazaar`
     resolves; everything else 404s. Secret returned as bare text
     with Cache-Control: no-store. Live-verified on the Pi.
   - tests/test_bootstrap_secrets.py covers four cases: 404 unprovisioned,
     200 with token, 404 unknown name, 500 on empty file.

2. install-tier-3-4.sh: Tier 4 is no longer optional
   - Resolves MB key in priority: env var → /opt/cis490/samples/.bazaar.token
     → https://bootstrap.wg/v1/secret/malwarebazaar.
   - Caches the bootstrap-fetched key locally so re-runs are offline.
   - If all three resolution paths fail, dies with the exact
     remediation command for the operator (one-time set-malwarebazaar-key.sh
     on the Pi).
   - auto_fetch_samples.py is run unconditionally (SKIP_TIER4 still
     works for emergency overrides but logs a warning that the host
     will produce only mimics). Deploy fails if zero binaries land
     in samples/store/ — no silent mimic-only fallback.
   - SKIP_TIER4 documentation now says 'DEPRECATED; defeats the project'.

3. scripts/set-malwarebazaar-key.sh
   - Pi-side helper: one operator command per fleet, ever. Accepts
     key via env or stdin, validates length, drops at the right
     path with the right perms. Lab hosts pull the rest automatically.

AGENTS.md: rewrote the Tier-4 section to reflect mandatory status +
the one-time-on-Pi distribution model.

152/152 tests pass. Bootstrap service updated live on the Pi.
2026-05-01 00:44:41 -05:00
max
683bfe9ce6 Tier 3 + Tier 4 auto-deploy: zero operator interaction
Replaces the manual runbook with scripts that just work. install-lab-host.sh
now runs the full Tier-3 deploy automatically as its 8th step (after the
mTLS cert lands), and Tier-4 auto-fetches when MALWAREBAZAAR_API_KEY is set.

Changes:

- install-msfrpcd.sh: actually runs the Rapid7 omnibus installer when
  metasploit-framework isn't present (was: bail with "install manually").
  apt-get and dnf paths both go through the same omnibus script with
  DEBIAN_FRONTEND=noninteractive. Idempotent.

- fetch-metasploitable2.sh: bakes in the SourceForge public-mirror URL
  (https://downloads.sourceforge.net/project/metasploitable/...) so no
  operator URL is required. sha256 is now optional and TOFU-pinned —
  first run records the hash to OUT_DIR/metasploitable2.qcow2.sha256;
  subsequent runs verify against that. Skips if qcow2 already present.

- scripts/install-tier-3-4.sh (new): orchestrates the four steps
  (msfrpcd → metasploitable2 → bridge → tier-3 verify) plus optional
  Tier-4 auto-fetch. Idempotent. SKIP_VERIFY / SKIP_BRIDGE / SKIP_TIER4
  env knobs for partial deploys.

- tools/auto_fetch_samples.py (new): when MALWAREBAZAAR_API_KEY is set,
  queries MB by each manifest entry's `family` (signature match), pulls
  the first match via fetch_sample.py, and rewrites manifest.toml in
  place (atomic tempfile + os.replace, preserving stat). Skips entries
  that already have sha256.

- install-lab-host.sh: gains a step 8 that calls install-tier-3-4.sh
  automatically when mTLS certs are on disk. --skip-tier3 flag for
  operators who want Tier 2 only. Skipped silently before certs land
  so first-pass install (host_id=REPLACE_ME) still works.

- AGENTS.md: rewrote the Tier-3 section to point at the one-shot
  script. Removed the old multi-command runbook so on-device agents
  can't accidentally follow stale steps.

Net effect: a fresh lab host now gets Tier 3 (and Tier 4 if API key
present) from a single sudo invocation. No operator picks for image
URLs, no manual metasploit installs, no manual manifest edits.
2026-04-30 23:12:08 -05:00