Registered as `knn_semi`. Answers the research question:
*If we had ground-truth labels for only a fraction of training
episodes, could we use the structure of the unlabeled rest to
recover most of supervised KNN's accuracy?*
Pipeline (Yarowsky-style self-training):
1. Split train slice deterministically into labeled (label_frac=0.2
default) and unlabeled (1 - label_frac) by row-index hash.
2. Fit a "labeler" KNN on the labeled fraction.
3. Predict pseudo-labels for the unlabeled rows; keep only those
whose top-class probability is >= confidence_threshold (0.6).
4. Fit the final KNN on (labeled rows + confident pseudo-labels).
Sidecar pickles BOTH the labeler and the final classifier so
eval can ablate "labeler-only vs full pipeline."
Smoke run (567-episode subset, oracle mode, label_frac=0.2):
val_macro_f1 test_macro_f1
knn (100% labels) 0.737 0.133
knn_semi (20% labels) 0.654 0.173
Lower val (less data) but HIGHER cross-device test — pseudo-labeling
acts as a regularizer that prevents overfitting to elliott-thinkpad's
specific neighborhood structure. Honest research finding worth a slide
in the writeup.
Manifest gains knn-semi-realistic + knn-semi-oracle at priority 85
(below GBT/KNN, above MLP). Storage cost = augmented set × n_features
× 4 bytes; same .knn.pkl sidecar format as plain KNN.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Non-parametric baseline alongside GBT/MLP/CNN/GRU/LSTM/Transformer.
Same BaseModel + schema-hashed checkpoint contract; sidecar is a
pickled sklearn KNeighborsClassifier (.knn.pkl) handled by the
existing checkpoint machinery alongside .xgb.json / .pt.
KNN's storage cost = n_train_rows × n_kept_features × 4 bytes.
At 660k windows × 145 kept (realistic mode) features = ~380 MB
sidecar; at 230 features (oracle) = ~600 MB. Heavy but ships through
the same artifact-upload path.
trainer/run.py learns a third fit branch:
- GBT — XGBoost early stopping on val mlogloss
- KNN — fit() memorizes; "training time" is val/test predict cost
- NN — train_nn loop (the rest)
Manifest gains knn-realistic + knn-oracle at priority 95 (just
below GBT). KNN's k=10 default lives in the model class — overriding
via hyper.k requires adding --k to run.py first to avoid the
unknown-arg exit-2 issue.
Smoke verified on the 567-episode subset:
knn oracle val=0.7365 test=0.1333 (held-out k-gamingcom)
That val/test gap (0.74 → 0.13) is the cross-device generalization
story: KNN memorizes elliott-thinkpad's local feature space and
falls apart on the other host. Honest baseline for the comparison
report.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symmetric companion to the collection fleet (orchestrator/fleet.py)
but for *training*. Collection is embarrassingly parallel; training
is not (a model is trained at most once across the fleet), so the
receiver coordinates which worker gets which job.
Operator-control surface is etc/training_manifest.toml.example —
single canonical file declaring (a) per-host capability + per-model
allow/deny policy, (b) one [[jobs]] entry per (model, mode, hyper)
with capability constraints (require_cuda, prefer_cuda, min_vram_gib,
min_ram_gib, allowed_hosts).
Components:
capability.py — self-detection: hostname, cores, RAM, CUDA presence,
VRAM, torch version, git commit. Used by workers to filter
eligible jobs before claiming.
manifest.py — TOML loader + JobSpec/HostSpec. Job IDs are stable
sha256 of (model, mode, hyper, split_recipe, train_hosts, seed)
so manifest reload is idempotent: existing rows keep their status,
new jobs become claimable, removed jobs stay until cancelled.
queue.py — SQLite job queue (training_jobs.db) with statuses
pending|claimed|running|completed|failed|cancelled. Atomic
claim_next via single UPDATE WHERE status='pending'. Heartbeat,
complete, fail. Stale-claim sweep (stale_after_s=600s) with
max_attempts cutoff to failed.
store.py — model artifact store mirroring receiver/store.py.
Artifact ID is the sha256 of the uploaded tarball; bit-identical
re-runs deduplicate.
receiver.py — Starlette app exposing 11 endpoints:
POST /v1/job/claim (worker)
POST /v1/job/{id}/heartbeat (worker)
POST /v1/job/{id}/complete (worker)
POST /v1/job/{id}/fail (worker)
PUT /v1/model/{id} (worker — uploads tarball)
GET /v1/jobs (anyone)
GET /v1/workers (anyone)
POST /v1/job/{id}/cancel (operator: X-Operator-Token)
POST /v1/job/{id}/requeue (operator)
POST /v1/manifest/reload (operator)
GET /v1/health (anyone)
Runs as cis490-trainer-receiver.service on the Pi alongside the
existing receiver, on a separate port.
client.py — stdlib HTTP client (urllib only, no new deps).
worker.py — long-running daemon. Loop: detect capability → claim →
spawn training/trainer/run.py subprocess → heartbeat every 30s →
tar artifact, sha256, PUT /v1/model → complete. SIGTERM-safe.
Operator CLI (tools/cis490_jobs.py): status / list / show / cancel /
requeue / reload / workers. Cancel and requeue require
$CIS490_OPERATOR_TOKEN matching the receiver's configured value.
Bootstrap: scripts/install-training-worker.sh (Linux systemd) and
scripts/install-training-worker-windows.ps1 (Windows Scheduled Task)
let the operator enroll a new host with one command after cloning
the repo and setting up the venv. Worker self-tests capability
before registering.
End-to-end smoke verified on the Pi: receiver up, manifest synced,
14 jobs queued, worker registered, claimed 4 CPU-eligible jobs
(allow_jobs=["gbt","mlp"]), completed 3 (gbt-realistic, gbt-oracle,
mlp-oracle), 1 failed with the actual error visible via
cis490-jobs status, 3 artifacts uploaded to
/var/lib/cis490/models/<model>_<mode>/<sha256>/bundle.tar.zst with
proper index.jsonl row.
21 unit tests (manifest validation: 8; queue lifecycle + eligibility:
13). All pass alongside the prior 17 training tests = 38 green.
Open limitations surfaced inline:
- Hyper-key drift between manifest and run.py fails at training
time, not at manifest reload (worth tightening to argparse
introspection later).
- mTLS not yet wired through Caddy for the trainer-receiver port —
listens loopback-only until that lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Starlette + WebSocket dashboard run on the Pi as cis490-dashboard.service
(127.0.0.1:8447, Caddy-fronted at dashboard.wg). Tails
/var/lib/cis490/index.jsonl for episode events, snapshots host counts
every 30s, broadcasts to every connected browser. New connections get a
warm snapshot (recent_episodes, total_bytes, host_counts) so reloads
don't see a cold dashboard.
Frontend is a 10-scene scrollytelling deck following the project
outline: intro, collect, hosts, db explorer, baseline, attacks,
chunking, models, knn, perf. Sticky full-bleed canvas with a
right-aligned prose column (matrix-explorable layout). Hotkeys (arrows,
space, j/k, c, Home/End), prev/next chevrons, FAB, and an opt-in
click-to-advance toggle. Demo toggle drives synthetic data for the
five scenes that have no real producer yet (attack envelopes,
chunking, model bars, knn scatter, perf scatter); when off, those
scenes show "awaiting <event_type> events" rather than fake data.
Producers wire in by POSTing typed JSON to 127.0.0.1:8447/publish
(loopback only; Caddy 404s it externally). Event types the widgets
subscribe to: model_metric {model, accuracy}, embedding {x, y, phase},
model_perf {model, latency_us, accuracy}, prediction {episode_id,
window_idx, predicted, actual}, attack_profile {name, shape, curve},
phase {phase}.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The experiment is now defined by a single version-pinned file —
manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every
lab host loads THIS exact file; per-host overrides of experiment
shape are forbidden.
Drops the following per-host CLI overrides that previously violated
the canonical-manifest principle:
* --manifest, --modules-dir (paths now derived)
* --ram-per-vm-mib (in manifest.experiment)
* --max-concurrent (manifest.experiment.fleet.max_concurrent_ceiling)
* --max-tier3-slots (manifest.experiment.fleet.max_tier3_slots)
* --force-tier2 (not a §14 sanctioned override knob —
ship empty catalog to disable Tier-3)
* --require-real-samples (sample-side concern; out of fleet scope)
* tools/run_*_demo.py --manifest (samples path now from canonical)
New surface:
* manifest.toml — the single source of truth
* orchestrator/manifest.py — load_canonical() + Manifest dataclass
with strict validation, raises
ManifestError on any failure
* EpisodeConfig.experiment_meta — populated by run_*_demo.py from
the canonical manifest; stamped
into every episode's meta.json
under "experiment" key for
provenance
* cis490-orchestrator.service — RestartPreventExitStatus=78 so
manifest-load failures stay
stuck-and-loud (§9, §4.7)
* install-lab-host.sh — validates manifest.toml at
install time; missing or invalid
= die with clear message
Catalog admission semantics: only modules whose name appears in
manifest.catalog get loaded into the runtime catalog (§4.3 in
miniature, will tighten further in step 4 when verified_against /
last_verified actually gate admission). Missing toml for an admitted
name is a sysadmin error → exit 78.
Renames cfg.manifest → cfg.samples + adds cfg.experiment to
disambiguate sample-manifest from experiment-manifest. Rewrites
test_fleet.py fixture to construct synthetic Manifest objects so
test outcomes don't depend on the on-disk manifest.toml content.
12 new tests in tests/test_manifest.py: schema-version mismatch,
unknown collector, duplicate collector, unknown phase, negative
phase seconds, negative ram, missing catalog fields, json round-trip.
Local run: `python tools/run_fleet.py --capacity` correctly logs the
loaded manifest and prints capacity. 241 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pieces of self-monitoring so the maintainer isn't the alarm:
(2) Receiver-side fleet health monitor
cis490-fleet-health.timer runs check_fleet_health.py every 5 min.
Detects three symptoms and writes them to
/var/lib/cis490/alerts.jsonl + a syslog WARNING (greppable / easy
to forward to a notifier):
silent — host shipped in last 24h but has been quiet >30 min
fatal-only — actively shipping but every PUT 4xx
unstamped — shipping without X-Cis490-Code-Commit header
Dedup is keyed on (host, symptom, hour-bucket) so a sustained fault
fires once per hour, not every 5 min. 15 unit tests cover the index
parser, three detectors, and dedup.
(3) Per-host doctor snapshots
Lab hosts run cis490-doctor-check.timer once a day (10 min after
boot, then daily with 30-min jitter). The timer runs
cis490_doctor.py --json and PUTs the result to a new endpoint:
PUT /v1/host-health/<host> → /var/lib/cis490/host-health/<host>.json
GET /v1/host-health → aggregate across all hosts
Endpoint is NOT gated by version_gate — sick hosts running stale
code MUST still be able to report sickness. 11 unit tests cover
PUT/GET, atomic-write semantics, bearer auth, and the
not-gated-by-version-gate property.
ship_health_check.py reuses the existing shipper transport (mTLS +
bearer + receiver URL from lab-host.toml) so we don't reimplement
auth.
Both timers wired into install-lab-host.sh — the loop also enables
the previously-added autoupdate + cert-fetch timers, so a single
install run gives a host all four self-healing mechanisms.
Tests: 293 pass (26 new — 15 fleet-health, 11 host-health). 2
pre-existing test_fleet.py failures from the elliott-ThinkPad
merge (667f042) are unrelated to this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-gamingcom symptom (2026-05-02): the on-device agent successfully
finished Tier-3 bring-up, but the shipper sits in "waiting on mTLS
material" because the cert auto-fetch step in install-lab-host.sh
either ran with host_id still REPLACE_ME, or hit a transient
bootstrap.wg failure, and there's no automatic retry. The Pi-side
cert IS minted and the bootstrap endpoint serves it — the failure
mode is purely "lab-host hasn't pulled it down."
Fix: extract the cert-fetch logic into scripts/fetch-lab-host-cert.sh
(idempotent, no-op when certs are already on disk, no-op when host_id
is unset, exit-0 on transient network failure so the unit doesn't
get pinned as failed), and run it from a 5-minute systemd timer.
The timer handles all three "stuck waiting on mTLS" cases without
operator action:
- operator edited host_id post-install but didn't re-run install
- bootstrap.wg was briefly unreachable during install
- lab host was offline when install ran but came up later
The script `try-restart`s cis490-shipper after a successful fetch
so the daemon picks up the new cert immediately instead of waiting
for its lazy retry. install-lab-host.sh still calls the script
on install for fast first-time bring-up — the timer is the safety
net.
Tarball extract is staged through a temp dir + atomic rename so a
mid-extract crash never leaves us with a mismatched cert/key pair.
AGENTS.md row 4 updated: "waiting on mTLS material" remediation now
points at the timer, with the exact `systemctl start
cis490-cert-fetch.service` command to force an immediate retry.
Tests: 267/267 unchanged. The fetch script is idempotent + has all
its happy/error paths handled inline; a unit test would mostly be
testing systemd's behaviour. The integration test path is the timer
running on a real lab host, which is the actual production case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root causes and fixes documented in TIER3-BRINGUP.md. Summary:
1. BRIDGE env var leaked into Tier-3 subprocess → target VM used tap
instead of SLIRP; fix: env.pop("BRIDGE") in fleet _run_slot.
2. usable_modules filter conditioned on BRIDGE presence → bridge-requiring
modules selected on SLIRP runs; fix: always filter requires_bridge.
3. cmd/unix/interact creates no session.list entry → session_open_timeout
every episode; fix: switch samba_usermap_script to cmd/unix/bind_perl.
4. Per-slot LPORT hostfwd used wrong guest port (host:5444→guest:4444);
fix: extra_host_port:extra_host_port mapping so guest binds the
per-slot LPORT directly.
5. vsftpd backdoor port 6200 hardcoded → collision across concurrent slots;
fix: requires_bridge=true filters it from SLIRP fleet runs.
6. SLIRP false-positive in _wait_for_tcp → exploit fires before Samba
boots (~60 s too early); fix: replace TCP probe with serial console
_wait_for_serial_login that waits for actual "login:" prompt.
7. Stale QEMU survives orchestrator restart (start_new_session=True) →
holds hostfwd ports, new QEMU silently fails; fix: kill by pgid from
old pidfile before rmtree.
8. PORT_BASE default used privileged port 21; fix: default to 2021+slot*100.
9. msfrpcd 6.x returns bytes for all string values even with raw=False;
fix: MSFRpcClient._str() recursive decoder applied to all responses.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Today's incident: post-cutover, k-gamingcom went silent and
elliott-thinkpad kept shipping pre-stamp episodes that the receiver
gate 400'd in a 2300+ PUT loop. Both required `git pull && install-
lab-host.sh` *on the host* — neither the on-device AI agent nor the
operator pulled in time, and from the receiver Pi I cannot reach in
(sshd off on the lab hosts).
Fix the recurrence directly: a 30-min systemd timer that does
git fetch + (if behind) ff-only pull + re-run install-lab-host.sh.
Hosts catch up on the next tick on their own — no human or agent
action required.
Mechanics:
- scripts/auto-update.sh runs as root, drops to cis490 for git ops
to satisfy /opt/cis490 ownership ("dubious ownership" guard).
- Refuses ff if local HEAD isn't an ancestor of origin/main —
protects operator hand-edits from silent overwrite.
- Network failures exit 0 (offline is normal, don't pin a unit
failure); divergence + install failures exit non-zero so the
journal records what broke.
- RandomizedDelaySec=10min on the timer prevents thundering-herd
when several hosts boot together.
- Hands off to install-lab-host.sh via exec — exactly one path
through bring-up; no special "auto" flow.
The version-gate provides the quality boundary, so even if origin/
main moves forward unsafely, the receiver's allow-list still
controls what lands in the index.
install-lab-host.sh enables cis490-autoupdate.timer on every run,
idempotent — existing hosts pick it up the next time they pull
manually.
Filed Forgejo #18 with the canonical command for elliott-thinkpad
+ k-gamingcom to bootstrap themselves out of the current incident
(auto-update doesn't help them retroactively — it has to be running
*before* the cutover to catch the next one).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three robustness items off the future-work list:
1. Shipper sd_notify watchdog. Type=notify + WatchdogSec=180. The
daemon sends READY=1 after queue construction and WATCHDOG=1 once
per scan pass via a heartbeat callback wired into run_forever.
Restart=on-failure only catches process death — silent stalls
(deadlock, hung tar subprocess, blocked I/O past timeout) used to
leave a zombie running with the data backlog growing. Now systemd
kills + restarts the daemon if no WATCHDOG=1 arrives within 180s.
Verified end-to-end against systemd via `systemd-run --transient
--property=Type=notify --property=WatchdogSec=10`: unit transitions
to active on READY=1; SIGSTOP'ing the process triggers
`Watchdog timeout (limit 10s)! Killing process N with SIGABRT` at
exactly t+10s, then unit goes failed → restart cycle.
2. Quarantine cleanup. Without an upper bound, data/quarantine/ grew
forever as fatal episodes piled up. New ShipperConfig fields:
quarantine_keep_days = 30 # opt-out: 0 disables
quarantine_cleanup_interval_s = 3600 # gate so 5s tick doesn't
# statx() the whole tree
Cleanup runs at the start of run_once() but is gated to once per
hour. Removed entries logged.
3. Doctor surfaces shipping errors. Tails 10 minutes of cis490-shipper
journal and surfaces 412/400/transient patterns as red/yellow rows
with the canonical fix command. An on-device agent running
cis490_doctor.py now sees one line ("12 ship(s) rejected as
out-of-window") instead of needing to grep the journal.
Tests: 200/200 (was 188). New coverage: heartbeat callback fires +
survives exceptions; quarantine cleanup respects keep_days, gate, and
opt-out; doctor parser correctly classifies 412/400/transient/clean/
empty/journalctl-denied; both error classes prioritise 412 (more
actionable) when present together.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
receiver.toml.example: the local_repo_path comment was wrong about
when it kicks in. With the new fallback path, it's used both when
forgejo_url is unset (sole backend) AND when forgejo is unreachable
(failover). Document that, plus the auto-detect of /opt/cis490/.git.
cis490_doctor: add a VERSION-stamp check for lab-host role. If
/opt/cis490/VERSION is missing or malformed, the orchestrator stamps
"unknown" → receiver gate rejects every PUT → quarantine. Surface
this as a red row with the canonical fix (re-run install-lab-host.sh)
so an on-device agent doesn't have to grep journal logs to figure it
out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Initial git-log-based gate ran into a permission wall: the cis490
service user can't read /home/max/cis490/.git (ProtectHome=true +
home-dir mode). Switching the production source to the local Forgejo
HTTP API (already accessible to all WG peers, single source of truth
both lab hosts and the receiver pull from). When the maintainer
pushes new code to spectral/CIS490, the next 5-second cache refresh
sees the new commit and lab hosts can immediately ship under it.
VersionGate now takes either:
- forgejo_url + repo_owner + repo_name + branch (+ optional
auth_token for private repos): hits
/api/v1/repos/<owner>/<name>/commits?sha=<branch>&limit=<n>
- repo_path: dev-only fallback, runs `git log` locally
Local-git path retained for tests + the dev-only case.
receiver.toml.example gains forgejo_url/repo_owner/repo_name/branch
with auth_token commented; live-deployed receiver.toml on the Pi has
the spectral org + token.
Live state on the Pi: 41 valid hashes loaded, head=f8ad02b. Verified
end-to-end:
bogus commit → 412 + remediation
HEAD commit → clears gate (fails downstream at sha-mismatch as
expected for the empty-body verify probe)
Test added: test_forgejo_backend_accepts_returned_commits stands up
a tiny canned-response HTTPServer in-process, exercises the parser
without depending on a live Forgejo instance. Brings test_version_gate
to 10 cases; total 158/158.
Stops out-of-date lab hosts from polluting the dataset with episodes
generated by buggy code. The valid-commits set mirrors the maintainer's
working clone on the Pi automatically — when the maintainer pulls or
pushes a new commit, the receiver picks it up within the 5-second
cache TTL with no service restart.
Receiver changes:
- receiver/version_gate.py (new): VersionGate(repo_path, window).
Each check() consults a frozenset of the last `window` commit
hashes from `git -C <repo> log --format=%H -n <window>`, refreshed
every 5s under a lock. Resilient to transient git failure (keeps
prior cache so a flaky `git` doesn't lock out every shipper).
- receiver/app.py: PUT extracts X-Cis490-Code-Commit; gate.check()
before ingest. Rejects with:
400 + remediation if header missing or malformed
412 + remediation + your_commit + head_commit if not in window
Remediation block is verbatim copy-pasteable into the lab-host
shell:
cd /opt/cis490 && sudo -u cis490 git pull origin main
sudo /opt/cis490/scripts/install-lab-host.sh
sudo systemctl restart cis490-orchestrator
- receiver/store.py: ingest_stream takes commit kwarg, stamps it on
the index.jsonl row (new optional field). Backfilled rows from
index_backfill.py also pull commit out of meta.json.
- receiver/config.py + etc/receiver.toml.example: new [version_gate]
section. enabled=true, repo_path=/home/max/cis490, window=100 by
default. Enabled toggle exists for emergency disable-and-collect.
Shipper changes:
- shipper/transport.py: ship_tarball() takes commit kwarg, sends
X-Cis490-Code-Commit header. 412 maps to status='fatal' so the
queue doesn't infinite-retry — operator must pull and reinstall
before the next ship will succeed.
- shipper/queue.py: reads meta.json::code_version.commit per
episode, passes through. On 412, logs the receiver's full
remediation block at ERROR level so journalctl on the lab host
shows exactly what to run.
Tests: 9 in test_version_gate (including 2 end-to-end via
starlette.testclient), 2 cover the boundary where new commits land
mid-cache and where missing-repo gracefully keeps prior cache.
157/157 total.
Index schema: existing rows stay valid (commit field is optional
on read). New rows from receiver-direct AND from index_backfill.py
include commit.
Replaces MalwareBazaar with theZoo (https://github.com/ytisf/theZoo).
theZoo is a public security-research repo with hundreds of malware
samples organized by family, password-protected with the well-known
'infected'. No API key, no signup, nothing for an operator to do —
which is what zero-touch tier-4 actually means.
Changes:
- tools/auto_fetch_samples.py: rewrite. Clones theZoo (shallow, ~500 MB)
to /var/lib/cis490/theZoo on first run, then for each manifest
family without a sha256 it locates a matching Binaries/<Name>
dir, extracts the .zip with password 'infected', picks the largest
non-text payload as the binary, sha256s it, stages at
samples/store/<sha256>, and rewrites manifest.toml in place
(atomic tempfile + os.replace, stat preserved). Mandatory exit
semantic: non-zero if no real samples landed.
- scripts/install-tier-3-4.sh: dropped the MB-key resolution chain
(env var → local file → bootstrap.wg fetch). Now just runs
auto_fetch_samples.py and dies if zero samples land. SKIP_TIER4
remains as the explicit override but is documented as defeating
the project.
- bootstrap/app.py + __main__.py + etc/cis490-bootstrap.service:
removed the /v1/secret/<name> endpoint and the --secrets-root flag.
Dead code now that no API key needs distributing. Live-rolled
back on the Pi (404 verified post-restart, stale /etc/cis490/secrets
dir removed).
- scripts/set-malwarebazaar-key.sh: deleted. No MB key means no
one-time operator step.
- tests/test_bootstrap_secrets.py: deleted (route removed).
- AGENTS.md: rewrote tier-4 section to reflect zero-operator model.
148/148 tests pass. Bootstrap service rolled back live.
User: 'we don't want it to be optional, this real malware IS the data
we want.' Acknowledged. Three changes make Tier 4 actually mandatory
without forcing per-host operator action:
1. bootstrap.wg /v1/secret/<name> endpoint
- Pi serves /etc/cis490/secrets/malwarebazaar.token to lab hosts
over the same trust boundary as the cert endpoint (WG mesh,
iptmonads-gated). Strict allow-list — only `malwarebazaar`
resolves; everything else 404s. Secret returned as bare text
with Cache-Control: no-store. Live-verified on the Pi.
- tests/test_bootstrap_secrets.py covers four cases: 404 unprovisioned,
200 with token, 404 unknown name, 500 on empty file.
2. install-tier-3-4.sh: Tier 4 is no longer optional
- Resolves MB key in priority: env var → /opt/cis490/samples/.bazaar.token
→ https://bootstrap.wg/v1/secret/malwarebazaar.
- Caches the bootstrap-fetched key locally so re-runs are offline.
- If all three resolution paths fail, dies with the exact
remediation command for the operator (one-time set-malwarebazaar-key.sh
on the Pi).
- auto_fetch_samples.py is run unconditionally (SKIP_TIER4 still
works for emergency overrides but logs a warning that the host
will produce only mimics). Deploy fails if zero binaries land
in samples/store/ — no silent mimic-only fallback.
- SKIP_TIER4 documentation now says 'DEPRECATED; defeats the project'.
3. scripts/set-malwarebazaar-key.sh
- Pi-side helper: one operator command per fleet, ever. Accepts
key via env or stdin, validates length, drops at the right
path with the right perms. Lab hosts pull the rest automatically.
AGENTS.md: rewrote the Tier-4 section to reflect mandatory status +
the one-time-on-Pi distribution model.
152/152 tests pass. Bootstrap service updated live on the Pi.
ca_bundle is what the shipper uses to verify collector.wg's TLS cert.
That cert is signed by the Caddy Local Authority, bundled in the repo
as etc/caddy-root.crt. Pointing it at wg-ca.pem (the wg-pki CIS490
Lab-Host Client CA, which is the *receiver's* trust anchor for our
client cert) caused CERTIFICATE_VERIFY_FAILED on every ship.
Original fix authored by the on-device agent on k-gamingcom in
Dev_REL2_043026@786b8da; cherry-picked here onto main.
Co-Authored-By: k-gamingcom on-device agent
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of "fleet says max_concurrent=3 but only one episode ships
per wave" symptom on elliott-lab:
1. orchestrator/fleet.py::_run_slot set
env["RUN_DIR"]=/tmp/cis490-vm-fleet-{slot} per slot.
2. tools/run_real_vm_demo.py defaulted --run-dir to /tmp/cis490-vm
(NO slot suffix), then UNCONDITIONALLY overwrote the env's
RUN_DIR with that flag's value before exec'ing the launcher.
3. So every slot's launcher saw RUN_DIR=/tmp/cis490-vm. All slots
collided on the same socket dir.
4. run_real_vm_demo.py also rmtree(run_dir) on entry — slot 1's
rmtree literally deleted slot 0's pidfile + sockets mid-boot.
5. Net effect: one VM survives per wave on a multi-core host that
should be running ~cores-1 in parallel. Throughput collapses
to 1/N.
Fix:
tools/run_real_vm_demo.py + tools/run_tier3_demo.py:
--run-dir default cascade —
1) explicit CLI flag
2) RUN_DIR env (set by fleet runner)
3) /tmp/cis490-vm-<SLOT> (SLOT from env, default 0)
Same change in both runners so Tier-2 + Tier-3 fleet waves
parallelize cleanly.
orchestrator/fleet.py::_run_slot:
Pass --run-dir explicitly to the subprocess so the per-slot path
is audit-visible in the fleet log instead of buried in env.
Also flip the subprocess interpreter to repo_root/.venv/bin/python
when present (was /usr/bin/env python3 — worked by luck because
the orchestrator path doesn't import msgpack/httpx, but a Tier-3
fleet wave would have died at import-time on a host without those
in system Python).
etc/cis490-orchestrator.service:
Removed the duplicate [Service] hardening block at the bottom of
the file that was silently overriding the AmbientCapabilities
grant (NoNewPrivileges=true at the bottom flipped the
NoNewPrivileges=false at the top, dropping CAP_NET_RAW + CAP_SYS_
ADMIN + CAP_PERFMON before per-episode subprocesses inherit
them). Sources 3 + 4 would have failed silently inside the
sandbox.
Added /tmp to ReadWritePaths so per-slot RUN_DIRs are writable.
106/106 tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a pull-based cert distribution path so install-lab-host.sh can
fetch its own leaf cert without operator intervention. Removes the
ssh-from-Pi requirement that blocked elliott-lab.
How the chicken-and-egg gets solved: a freshly wg-enrolled lab host
already has WG access (gate kept by iptmonads at L4) and trusts the
Caddy local CA (bundled in this repo at etc/caddy-root.crt). It
makes a single TLS call to https://bootstrap.wg/v1/cert/<host_id>
— no mTLS — gets back a tar of {ca.crt, leaf.pem, leaf.key},
extracts to /etc/cis490/certs/, and the shipper unblocks. Trust
boundary is "reached :443 over WG"; no operator action needed.
bootstrap/
app.py Starlette: GET /v1/cert/{host_id}, GET /v1/health.
Validates host_id charset, rate-limits per source IP,
logs every mint with the X-Real-IP Caddy injects.
__main__.py uvicorn launcher; runs as root because the wg-pki CA
private key is root-only.
etc/cis490-bootstrap.service
systemd unit on 127.0.0.1:8446 with ProtectSystem=strict +
narrow ReadWritePaths=/var/lib/wg-pki. ProtectHome=no because
systemd's read-only mode hides /home contents (the issuer script
the wrapper exec's lives there).
scripts/issue-cis490-client-cert-wrapper.sh
Adapter the bootstrap service shells out to. Resolves the actual
wg-pki issuer script across the three plausible install layouts
(/opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki) so a single
copy of the unit file works on any operator's box. Forces
--out-dir to /var/lib/wg-pki/issued so writes stay inside the
service's narrow ReadWritePaths.
scripts/install-lab-host.sh
After scaffolding lab-host.toml, if /etc/cis490/certs/lab-host.pem
is absent, curls bootstrap.wg with --cacert etc/caddy-root.crt
(no chicken-and-egg), extracts, chowns/chmods. Skips silently if
bootstrap.wg is unreachable so manual hand-carry remains possible.
scripts/install-receiver.sh
Drops cis490-bootstrap.service alongside cis490-receiver and
prints both as "enable --now" candidates. cis490-bootstrap is the
thing that makes lab hosts self-provisioning.
etc/caddy-root.crt
Bundled copy of wg-pki's published Caddy local CA root, so the
bootstrap fetch can verify TLS without depending on a wg-pki
clone that may or may not be on the lab host yet.
Verified live on the Pi:
$ curl --cacert etc/caddy-root.crt https://bootstrap.wg/v1/cert/elliott-lab -o /tmp/x.tar
HTTP 200 size=10240
$ tar tf /tmp/x.tar
ca.crt
elliott-lab.key
elliott-lab.pem
$ openssl verify -CAfile … elliott-lab.pem
/tmp/.../elliott-lab.pem: OK
$ openssl x509 -subject … -noout
subject=CN=elliott-lab
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the gaps surfaced in the "what is not implemented" audit so the
fleet really is shippable end-to-end. Verified live on the Pi:
- cis490-shipper --ping → HTTP 200 through Caddy + mTLS via the
new wg-pki client CA leaf
- real episode dir → tar+zstd → PUT → HTTP 201 stored
- re-ship same bytes → 200 (idempotent)
- re-ship different bytes under same id → 409 (conflict)
Changes:
orchestrator/episode.py
- EpisodeConfig.revert_at_start / revert_at_end (Tier 0+ snapshot/
revert per docs/architecture.md). When set + qmp_socket present,
EpisodeRunner issues loadvm <snapshot_name> and emits
snapshot_revert / snapshot_revert_failed events on the same
monotonic clock as everything else.
collectors/qmp.py
- savevm() / loadvm() helpers using human-monitor-command, plus a
test against the fake QMP server.
exploits/workloads.py
- chunked_real_binary_upload() returns a ChunkedUpload plan: 8 KiB
base64 chunks (~6 KiB binary each) so msfrpc never sees a buffer-
busting payload. Includes a finalize step that sha256-verifies on
the guest before exec.
- real_binary_workload() now wraps the chunked plan for backwards
compat with single-shot callers.
exploits/driver.py
- Tier-4 dispatch walks the chunked plan in MSFExploitDriver:
each chunk is a separate session_shell_write; finalize verifies;
exec only runs on sha-ok. New events: real_binary_upload_begin,
real_binary_verify, real_binary_aborted.
etc/cis490-orchestrator.service
- Reads /etc/cis490/lab-host.env (FLEET_HOST_ID + optional BRIDGE).
- Grants AmbientCapabilities CAP_NET_RAW (tcpdump for source 4) +
CAP_SYS_ADMIN + CAP_PERFMON (perf for source 3) so collectors
work under hardening.
scripts/install-lab-host.sh
- Writes /etc/cis490/lab-host.env on first install with FLEET_HOST_ID
defaulting to `hostname -s`.
- Best-effort: fetches the Alpine baseline qcow2 (sha512-pinned) and
builds cidata.iso with the in-guest agent embedded; symlinks both
into /opt/cis490/vm/images/ so launchers find them.
scripts/fetch-alpine-baseline.sh
- Idempotent fetcher for the Alpine 3.21 cloud-init nocloud qcow2
matching the sha512 in docs/sources.md.
tools/plot_envelope.py
- Rebuilt to render whatever telemetry the episode dir contains:
proc → QMP block ops → perf IPC/miss-rate → bridge pkts/SYNs →
guest agent load/mem. Missing sources are silently skipped.
tools/index_reader.py
- cis490-index CLI: filter receiver's index.jsonl by host / sample
/ time range, sort, count-by group. Closest thing to a query
interface until we stand up Postgres/Timescale.
samples/README.md
- Rewritten to match the new manifest schema, the kind=real vs mimic
split, the per-(host, slot, ep) selection mechanic, and the
chunked-upload safety story.
Tests: 106 pass (was 102). New cases:
- test_qmp.py — savevm + loadvm (HMP wrapper + error path)
- test_tier4.py — chunked plan splitting, sha-pinned finalize,
end-to-end driver walks all chunks + verify + exec via the fake
msfrpc client
Closes the "what is not implemented" punch list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is the chunk that makes "real data" actually flow on multiple
hosts in parallel. End-to-end pipe was up at 613c6fa / 2579683; now
the lab-host side has the diversity + concurrency it needs.
Collectors landed:
collectors/qmp.py — source 2 (oracle). Tiny synchronous QMP
client + row builder + run loop. Tolerates
older qemu without query-stats.
collectors/guest_agent.py — source 5 (deployable). Reads the
virtio-serial host-side socket, parses
agent JSON-lines, re-stamps to the host
monotonic clock, persists.
collectors/pcap.py — source 4 (deployable). tcpdump capture
+ pure-Python pcap reader + 100 ms
netflow.jsonl bucketizer. Decodes
Ethernet/IPv4/TCP/UDP enough for the
schema in docs/data-model.md.
In-guest agent:
vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads
/proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs,
thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent.
tools/build_cidata.py — embeds the agent + an OpenRC service into
user-data so first boot of the Alpine cidata image auto-starts it.
Launchers:
vm/launch_demo.sh / launch_target.sh — second virtio-serial port for
the agent socket; SLOT env support so multiple VMs run without
socket / port collisions; PORT_BASE on launch_target so multiple
target VMs hostfwd different host ports.
vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24,
no NAT). Idempotent.
Fleet:
orchestrator/fleet.py — capacity detector (cores / RAM / load
headroom) + concurrent-slot runner. Per-slot ENV selects the
sample. FleetCapacity dataclass round-trips into meta.json so
"this episode ran with 6 concurrent VMs" is auditable post-hoc.
tools/run_fleet.py — CLI: --capacity report; --waves N runs N
waves of (max_concurrent) episodes each, every slot with a
different sample.
etc/cis490-orchestrator.service — now drives the fleet runner with
Restart=always so each invocation runs one wave and respawns,
giving a continuous stream.
Samples:
samples/manifest.toml — six profiles spanning the five major
behaviour shapes. Each entry is real OR mimic (sha256 distinguishes).
samples/manifest.py — strict TOML loader (rejects dups, unknown
categories) + deterministic select(host_id, slot, episode_index)
so different hosts on the network walk the catalog in different
orders without any coordinator.
EpisodeRunner:
orchestrator/episode.py — optional qmp_socket + guest_agent_socket
fields on EpisodeConfig; when set, additional collector threads
run alongside proc_qemu. EpisodeResult now carries rows_qmp +
rows_guest counters.
Tier-3 setup automation:
scripts/install-msfrpcd.sh — installs metasploit-framework where
the package manager has it, generates a strong password into
/etc/cis490/msfrpc.env, drops a hardened systemd unit bound to
127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch
once MSFRPC_PASSWORD is sourced.
scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256
from the operator (Rapid7 download is registration-walled), pulls,
verifies, converts vmdk → qcow2, lands at vm/images/.
Tests: 82 pass (was 51). New suites:
tests/test_qmp.py — fake QMP server, capability handshake,
blockstats, async-event interleaving,
5-failure backoff
tests/test_guest_agent.py — fake virtio socket, JSON-lines read +
re-stamp, malformed-line tolerance
tests/test_pcap.py — synthetic pcap with TCP/UDP/ARP frames,
bucketize correctness across windows
tests/test_fleet.py — capacity math (8-core idle / low-RAM /
high-load / Pi5 / 1-core box), manifest
selection determinism + diversity
What's queued for the next commit (already discussed in convo):
- MSFExploitDriver v2: map sample.profile → distinct in-session
workload so Tier-3 episodes don't all produce the same yes-loop
envelope. Critical for ML to learn varied malware shapes.
- Real-sample fetch from MalwareBazaar by sha256.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the deployment loop end-to-end on the CIS490 side:
shipper/
config.py ShipperConfig (host_id, paths, receiver endpoint, mTLS)
transport.py httpx-based PUT + ping with mTLS + bearer support
queue.py scan data/episodes/, tar+zstd via system zstd, ship,
retire to data/shipped/. Idempotent across crashes per
the state machine in docs/transport.md.
__main__.py CLI: --ping (smoke test), --once (one pass), or daemon
receiver/app.py: new POST /v1/ping that requires the same auth as PUT
/v1/episodes but writes nothing. Used by `cis490-shipper --ping`
during lab-host bring-up to verify the WG/Caddy/mTLS path before
shipping any real bytes.
etc/
cis490-shipper.service systemd unit for the lab-host shipper
cis490-orchestrator.service systemd unit for the lab-host queue
(kept disabled by default until queue
mode lands)
lab-host.toml.example config template
scripts/
install-lab-host.sh idempotent installer; verifies prereqs,
creates cis490 service user, syncs repo to
/opt/cis490, builds venv, drops systemd units
and config template
install-receiver.sh same, for the receiver role on the central WG
node (Pi5 in our setup)
tests/test_shipper.py 11 end-to-end tests against a real Uvicorn
server hosting the receiver app. Exercises
ping, tar+ship, idempotent re-ship, 409
conflict, transient (receiver down), tarball
round-trip via system zstd.
AGENTS.md guidance for AI agents working on this and sibling repos.
Headline: when you hit an issue you can't fully fix in
scope, file a Forgejo issue rather than leaving a TODO.
51/51 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements docs/transport.md as a small Starlette app. The receiver streams
episode tarballs to disk, verifies sha256 against an X-Content-SHA256 header,
atomically renames into the store on success, and appends one row to a flat
index.jsonl. No DB. Idempotent re-PUTs return 200; conflicting bodies return
409. Optional bearer-token auth (mTLS terminates at Caddy in prod).
receiver/
store.py EpisodeStore: sha-verifying streaming ingest, atomic rename,
append-only index. No HTTP.
app.py make_app(): Starlette routes + bearer guard.
config.py ReceiverConfig.load(): TOML parser.
__main__.py uvicorn entrypoint, reads --config TOML.
tests/test_receiver.py — 13 tests via httpx.ASGITransport. Covers: 201 new,
200 idempotent replay, 409 conflict, 400 sha mismatch + cleanup, 400 missing/
short header, 400 bad id, 400 bad suffix, 413 too large, 401 bearer enforcement,
schema-version pass-through.
etc/cis490-receiver.service — systemd unit with hardening flags.
etc/receiver.toml.example — config template matching docs/deploy.md.
End-to-end smoke-tested with curl: 201 → 200 → 409 path verified, file
on disk, single index row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>