User caught it: I shipped the theZoo path without running it
end-to-end. A real fetch on the Pi exposed two bugs:
1. Family-name matcher was substring-strict. "Cryptolocker-class"
wouldn't match the dir "CryptoLocker_22Jan2014" because "-class"
isn't in the dir name. Now expands to a sequence of tokens
(full, head-of-dash, head-of-dot, head-of-underscore) and tries
each. First match wins.
2. Extraction picker was "largest non-text" — a bad heuristic for
theZoo, where each Linux.* zip often contains MULTIPLE binaries
for different platforms (Linux i386, x86-64, ARM, FreeBSD, sometimes
even Windows PE). The largest is rarely the i386 Linux ELF that
would actually run on Metasploitable2. Now sniffs ELF magic bytes
in stdlib and tiers:
1. Linux i386 ELF (largest first)
2. any other ELF (best-effort, may not execute)
3. largest non-text (Wine fallback)
Verified end-to-end on the Pi against a real theZoo clone (~500 MB,
263 family dirs, 2026-05-01 fresh pull):
linux-encoder-ransomware → ELF 32-bit Intel i386 SYSV (278 KB)
linux-wirenet-rat → ELF 32-bit Intel i386 SYSV (64 KB)
linux-rex-ransomware → ELF 32-bit Intel i386 SYSV Go (7.6 MB)
linux-neurevt-bot → ELF 32-bit Intel i386 SYSV (3.0 MB)
linux-earthkrahang-apt → ELF 32-bit Intel i386 GNU/Linux (5.8 MB)
5/5 picks are runnable Linux i386 ELFs. Manifest rewrites in place
add source/sha256/url; meta.sample.kind goes to "real" automatically.
Manifest rewritten:
- Old families (XMRig, Mirai, Cryptolocker-class, Dridex, Kovter,
Reverse-Shell) → mostly absent from theZoo's Linux catalog or
matched the wrong arch.
- New families chosen against a verified theZoo presence list:
Linux.Encoder, Linux.Wirenet, Ransomware.Rex, Neurevt,
EarthKrahang.
- XMRig + Kovter remain as mimic-only fallbacks (theZoo lacks a
runnable Linux i386 binary for these; orchestrator falls back
to the mimic profile).
Tests added (tests/test_auto_fetch_samples.py): 13 cases covering
ELF magic detection (i386 accepted, FreeBSD/x86-64/ARM/PE32/text
all rejected), family-token expansion (the "-class" suffix bug),
extraction picker (prefers Linux i386 over larger non-Linux ELFs),
manifest in-place rewrite preserves mode + skips entries that
already have sha256.
What's still NOT verified end-to-end (requires a lab host with
KVM x86):
- Metasploitable2 boot under QEMU
- vsftpd_234_backdoor exploit fire via msfrpcd
- chunked binary upload through a real shell session
- real binary executing inside a Metasploitable2 guest
The Pi is ARM64 — can't run Metasploitable2. install-tier-3-4.sh's
verify step (run_tier3_demo.py) covers all four on a real lab host;
deploy verifies on first run there.
171/171 tests pass.
Replaces MalwareBazaar with theZoo (https://github.com/ytisf/theZoo).
theZoo is a public security-research repo with hundreds of malware
samples organized by family, password-protected with the well-known
'infected'. No API key, no signup, nothing for an operator to do —
which is what zero-touch tier-4 actually means.
Changes:
- tools/auto_fetch_samples.py: rewrite. Clones theZoo (shallow, ~500 MB)
to /var/lib/cis490/theZoo on first run, then for each manifest
family without a sha256 it locates a matching Binaries/<Name>
dir, extracts the .zip with password 'infected', picks the largest
non-text payload as the binary, sha256s it, stages at
samples/store/<sha256>, and rewrites manifest.toml in place
(atomic tempfile + os.replace, stat preserved). Mandatory exit
semantic: non-zero if no real samples landed.
- scripts/install-tier-3-4.sh: dropped the MB-key resolution chain
(env var → local file → bootstrap.wg fetch). Now just runs
auto_fetch_samples.py and dies if zero samples land. SKIP_TIER4
remains as the explicit override but is documented as defeating
the project.
- bootstrap/app.py + __main__.py + etc/cis490-bootstrap.service:
removed the /v1/secret/<name> endpoint and the --secrets-root flag.
Dead code now that no API key needs distributing. Live-rolled
back on the Pi (404 verified post-restart, stale /etc/cis490/secrets
dir removed).
- scripts/set-malwarebazaar-key.sh: deleted. No MB key means no
one-time operator step.
- tests/test_bootstrap_secrets.py: deleted (route removed).
- AGENTS.md: rewrote tier-4 section to reflect zero-operator model.
148/148 tests pass. Bootstrap service rolled back live.
User: 'we don't want it to be optional, this real malware IS the data
we want.' Acknowledged. Three changes make Tier 4 actually mandatory
without forcing per-host operator action:
1. bootstrap.wg /v1/secret/<name> endpoint
- Pi serves /etc/cis490/secrets/malwarebazaar.token to lab hosts
over the same trust boundary as the cert endpoint (WG mesh,
iptmonads-gated). Strict allow-list — only `malwarebazaar`
resolves; everything else 404s. Secret returned as bare text
with Cache-Control: no-store. Live-verified on the Pi.
- tests/test_bootstrap_secrets.py covers four cases: 404 unprovisioned,
200 with token, 404 unknown name, 500 on empty file.
2. install-tier-3-4.sh: Tier 4 is no longer optional
- Resolves MB key in priority: env var → /opt/cis490/samples/.bazaar.token
→ https://bootstrap.wg/v1/secret/malwarebazaar.
- Caches the bootstrap-fetched key locally so re-runs are offline.
- If all three resolution paths fail, dies with the exact
remediation command for the operator (one-time set-malwarebazaar-key.sh
on the Pi).
- auto_fetch_samples.py is run unconditionally (SKIP_TIER4 still
works for emergency overrides but logs a warning that the host
will produce only mimics). Deploy fails if zero binaries land
in samples/store/ — no silent mimic-only fallback.
- SKIP_TIER4 documentation now says 'DEPRECATED; defeats the project'.
3. scripts/set-malwarebazaar-key.sh
- Pi-side helper: one operator command per fleet, ever. Accepts
key via env or stdin, validates length, drops at the right
path with the right perms. Lab hosts pull the rest automatically.
AGENTS.md: rewrote the Tier-4 section to reflect mandatory status +
the one-time-on-Pi distribution model.
152/152 tests pass. Bootstrap service updated live on the Pi.
Replaces the manual runbook with scripts that just work. install-lab-host.sh
now runs the full Tier-3 deploy automatically as its 8th step (after the
mTLS cert lands), and Tier-4 auto-fetches when MALWAREBAZAAR_API_KEY is set.
Changes:
- install-msfrpcd.sh: actually runs the Rapid7 omnibus installer when
metasploit-framework isn't present (was: bail with "install manually").
apt-get and dnf paths both go through the same omnibus script with
DEBIAN_FRONTEND=noninteractive. Idempotent.
- fetch-metasploitable2.sh: bakes in the SourceForge public-mirror URL
(https://downloads.sourceforge.net/project/metasploitable/...) so no
operator URL is required. sha256 is now optional and TOFU-pinned —
first run records the hash to OUT_DIR/metasploitable2.qcow2.sha256;
subsequent runs verify against that. Skips if qcow2 already present.
- scripts/install-tier-3-4.sh (new): orchestrates the four steps
(msfrpcd → metasploitable2 → bridge → tier-3 verify) plus optional
Tier-4 auto-fetch. Idempotent. SKIP_VERIFY / SKIP_BRIDGE / SKIP_TIER4
env knobs for partial deploys.
- tools/auto_fetch_samples.py (new): when MALWAREBAZAAR_API_KEY is set,
queries MB by each manifest entry's `family` (signature match), pulls
the first match via fetch_sample.py, and rewrites manifest.toml in
place (atomic tempfile + os.replace, preserving stat). Skips entries
that already have sha256.
- install-lab-host.sh: gains a step 8 that calls install-tier-3-4.sh
automatically when mTLS certs are on disk. --skip-tier3 flag for
operators who want Tier 2 only. Skipped silently before certs land
so first-pass install (host_id=REPLACE_ME) still works.
- AGENTS.md: rewrote the Tier-3 section to point at the one-shot
script. Removed the old multi-command runbook so on-device agents
can't accidentally follow stale steps.
Net effect: a fresh lab host now gets Tier 3 (and Tier 4 if API key
present) from a single sudo invocation. No operator picks for image
URLs, no manual metasploit installs, no manual manifest edits.