Adds a pull-based cert distribution path so install-lab-host.sh can
fetch its own leaf cert without operator intervention. Removes the
ssh-from-Pi requirement that blocked elliott-lab.
How the chicken-and-egg gets solved: a freshly wg-enrolled lab host
already has WG access (gate kept by iptmonads at L4) and trusts the
Caddy local CA (bundled in this repo at etc/caddy-root.crt). It
makes a single TLS call to https://bootstrap.wg/v1/cert/<host_id>
— no mTLS — gets back a tar of {ca.crt, leaf.pem, leaf.key},
extracts to /etc/cis490/certs/, and the shipper unblocks. Trust
boundary is "reached :443 over WG"; no operator action needed.
bootstrap/
app.py Starlette: GET /v1/cert/{host_id}, GET /v1/health.
Validates host_id charset, rate-limits per source IP,
logs every mint with the X-Real-IP Caddy injects.
__main__.py uvicorn launcher; runs as root because the wg-pki CA
private key is root-only.
etc/cis490-bootstrap.service
systemd unit on 127.0.0.1:8446 with ProtectSystem=strict +
narrow ReadWritePaths=/var/lib/wg-pki. ProtectHome=no because
systemd's read-only mode hides /home contents (the issuer script
the wrapper exec's lives there).
scripts/issue-cis490-client-cert-wrapper.sh
Adapter the bootstrap service shells out to. Resolves the actual
wg-pki issuer script across the three plausible install layouts
(/opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki) so a single
copy of the unit file works on any operator's box. Forces
--out-dir to /var/lib/wg-pki/issued so writes stay inside the
service's narrow ReadWritePaths.
scripts/install-lab-host.sh
After scaffolding lab-host.toml, if /etc/cis490/certs/lab-host.pem
is absent, curls bootstrap.wg with --cacert etc/caddy-root.crt
(no chicken-and-egg), extracts, chowns/chmods. Skips silently if
bootstrap.wg is unreachable so manual hand-carry remains possible.
scripts/install-receiver.sh
Drops cis490-bootstrap.service alongside cis490-receiver and
prints both as "enable --now" candidates. cis490-bootstrap is the
thing that makes lab hosts self-provisioning.
etc/caddy-root.crt
Bundled copy of wg-pki's published Caddy local CA root, so the
bootstrap fetch can verify TLS without depending on a wg-pki
clone that may or may not be on the lab host yet.
Verified live on the Pi:
$ curl --cacert etc/caddy-root.crt https://bootstrap.wg/v1/cert/elliott-lab -o /tmp/x.tar
HTTP 200 size=10240
$ tar tf /tmp/x.tar
ca.crt
elliott-lab.key
elliott-lab.pem
$ openssl verify -CAfile … elliott-lab.pem
/tmp/.../elliott-lab.pem: OK
$ openssl x509 -subject … -noout
subject=CN=elliott-lab
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the missing diagnostic + onboarding tools so an agent (AI or
human) handed a fresh lab host can get to "shipping data" without
re-deriving every step from logs.
tools/cis490_doctor.py — one-shot health check that walks the full
stack from the bottom up. Each row is green/yellow/red with an
exact fix command for the red rows. Checks:
- repo: branch, tree-clean, distance from origin/main
- install: /opt/cis490, .venv python, /etc/cis490/{lab-host,receiver}.toml,
/etc/cis490/lab-host.env
- mTLS: /etc/cis490/certs/{wg-ca,lab-host}.{pem,key}, openssl chain verify
- systemd: cis490-{shipper,orchestrator,receiver} active state
- net: receiver.url DNS, TCP reach, mTLS handshake to collector.wg
- vm prereqs: /dev/kvm, qemu-system-x86_64, zstd, alpine-baseline.qcow2,
cidata.iso
- tier3 prereqs: msfrpcd, metasploitable2.qcow2 (warn-level)
- end-to-end: cis490-shipper --ping
Modes: --role {lab-host,receiver}, --json (machine-readable),
--no-tier3 (skip optional checks). Exits non-zero on any red row.
ANSI color (auto-disabled on non-tty / NO_COLOR).
AGENTS.md gains a "How a lab host gets to shipping data" canonical
flow at the top: cert delivery via wg-pki/deploy-cis490-cert.sh →
install-lab-host.sh → cis490-doctor → systemctl enable. Plus an
"on-demand episode" recipe + a "smallest E2E test" snippet for
agents that need to verify the pipe without waiting on the timer.
The strict "cloning the repo by itself does nothing" callout makes
the failure mode mu and elliott-lab hit explicit.
scripts/install-lab-host.sh prints a 5-step banner on first install
that points at cis490_doctor.py + the deploy-cis490-cert.sh flow,
plus an always-printed footer warning that "cloning + running
launchers manually is NOT enough." Same message the AGENTS.md
section reinforces.
Refs spectral/CIS490#8 (the "Tier-2 is shipping in the meantime"
claim that turned out to be untrue because no cis490-shipper
service was running on elliott-lab — exactly the case this
diagnostic tool targets).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the gaps surfaced in the "what is not implemented" audit so the
fleet really is shippable end-to-end. Verified live on the Pi:
- cis490-shipper --ping → HTTP 200 through Caddy + mTLS via the
new wg-pki client CA leaf
- real episode dir → tar+zstd → PUT → HTTP 201 stored
- re-ship same bytes → 200 (idempotent)
- re-ship different bytes under same id → 409 (conflict)
Changes:
orchestrator/episode.py
- EpisodeConfig.revert_at_start / revert_at_end (Tier 0+ snapshot/
revert per docs/architecture.md). When set + qmp_socket present,
EpisodeRunner issues loadvm <snapshot_name> and emits
snapshot_revert / snapshot_revert_failed events on the same
monotonic clock as everything else.
collectors/qmp.py
- savevm() / loadvm() helpers using human-monitor-command, plus a
test against the fake QMP server.
exploits/workloads.py
- chunked_real_binary_upload() returns a ChunkedUpload plan: 8 KiB
base64 chunks (~6 KiB binary each) so msfrpc never sees a buffer-
busting payload. Includes a finalize step that sha256-verifies on
the guest before exec.
- real_binary_workload() now wraps the chunked plan for backwards
compat with single-shot callers.
exploits/driver.py
- Tier-4 dispatch walks the chunked plan in MSFExploitDriver:
each chunk is a separate session_shell_write; finalize verifies;
exec only runs on sha-ok. New events: real_binary_upload_begin,
real_binary_verify, real_binary_aborted.
etc/cis490-orchestrator.service
- Reads /etc/cis490/lab-host.env (FLEET_HOST_ID + optional BRIDGE).
- Grants AmbientCapabilities CAP_NET_RAW (tcpdump for source 4) +
CAP_SYS_ADMIN + CAP_PERFMON (perf for source 3) so collectors
work under hardening.
scripts/install-lab-host.sh
- Writes /etc/cis490/lab-host.env on first install with FLEET_HOST_ID
defaulting to `hostname -s`.
- Best-effort: fetches the Alpine baseline qcow2 (sha512-pinned) and
builds cidata.iso with the in-guest agent embedded; symlinks both
into /opt/cis490/vm/images/ so launchers find them.
scripts/fetch-alpine-baseline.sh
- Idempotent fetcher for the Alpine 3.21 cloud-init nocloud qcow2
matching the sha512 in docs/sources.md.
tools/plot_envelope.py
- Rebuilt to render whatever telemetry the episode dir contains:
proc → QMP block ops → perf IPC/miss-rate → bridge pkts/SYNs →
guest agent load/mem. Missing sources are silently skipped.
tools/index_reader.py
- cis490-index CLI: filter receiver's index.jsonl by host / sample
/ time range, sort, count-by group. Closest thing to a query
interface until we stand up Postgres/Timescale.
samples/README.md
- Rewritten to match the new manifest schema, the kind=real vs mimic
split, the per-(host, slot, ep) selection mechanic, and the
chunked-upload safety story.
Tests: 106 pass (was 102). New cases:
- test_qmp.py — savevm + loadvm (HMP wrapper + error path)
- test_tier4.py — chunked plan splitting, sha-pinned finalize,
end-to-end driver walks all chunks + verify + exec via the fake
msfrpc client
Closes the "what is not implemented" punch list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the deployment loop end-to-end on the CIS490 side:
shipper/
config.py ShipperConfig (host_id, paths, receiver endpoint, mTLS)
transport.py httpx-based PUT + ping with mTLS + bearer support
queue.py scan data/episodes/, tar+zstd via system zstd, ship,
retire to data/shipped/. Idempotent across crashes per
the state machine in docs/transport.md.
__main__.py CLI: --ping (smoke test), --once (one pass), or daemon
receiver/app.py: new POST /v1/ping that requires the same auth as PUT
/v1/episodes but writes nothing. Used by `cis490-shipper --ping`
during lab-host bring-up to verify the WG/Caddy/mTLS path before
shipping any real bytes.
etc/
cis490-shipper.service systemd unit for the lab-host shipper
cis490-orchestrator.service systemd unit for the lab-host queue
(kept disabled by default until queue
mode lands)
lab-host.toml.example config template
scripts/
install-lab-host.sh idempotent installer; verifies prereqs,
creates cis490 service user, syncs repo to
/opt/cis490, builds venv, drops systemd units
and config template
install-receiver.sh same, for the receiver role on the central WG
node (Pi5 in our setup)
tests/test_shipper.py 11 end-to-end tests against a real Uvicorn
server hosting the receiver app. Exercises
ping, tar+ship, idempotent re-ship, 409
conflict, transient (receiver down), tarball
round-trip via system zstd.
AGENTS.md guidance for AI agents working on this and sibling repos.
Headline: when you hit an issue you can't fully fix in
scope, file a Forgejo issue rather than leaving a TODO.
51/51 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>