PIPELINE.md is the canonical plan for the data-collection / emulation / labelling pipeline. It supersedes any guidance in AGENTS.md, README.md, or other repo docs that contradicts it (§17). Future sessions read it before changing anything in the pipeline. AGENTS.md is rewritten to point at PIPELINE.md as canonical and to strip the prescriptive symptom→fix table that absorbed producer-side defects instead of fixing them (§7.1 compensating-layer pattern). FIXYOURSELF.md is deleted (§4.12, §7.10 recovery-layer pattern). The states it covered are made impossible by the §4.6 acceptance gate landing later in §5; recovering from a state that shouldn't exist is itself the bandaid we're removing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.4 KiB
AGENTS.md — guidance for AI agents working on this repo
This project is part of the spectral lab (http://maxgit.wg/spectral/).
The conventions below also apply to sibling repos (wg-enroll,
wg-pki, caddy, iptmonads, matrix, forgejo, vault,
openclaw-deploy).
⚡ FIRST: read PIPELINE.md
PIPELINE.md is the canonical plan for this repo. Read it before changing anything in the data-collection / emulation / labelling pipeline. If anything in this file or any other doc contradicts PIPELINE.md, PIPELINE.md wins and the other doc is wrong.
This file is for general engineering conventions. The pipeline correctness story lives in PIPELINE.md.
What this project is
CIS490 trains a behavioral malware-detection model from labelled
episodes captured on lab-host VMs running real or mimic workloads,
optionally driven into infected states by Metasploit modules. The
producer is the orchestrator on each lab host; the consumer is the
receiver on the Pi (office-print, 10.100.0.1).
The producer must ship only ground-truth episodes. The receiver must reject anything that doesn't meet the bar. See PIPELINE.md.
Hard rules — do not break these
- Do not silently downgrade a host. If a collector is silent, an exploit doesn't land, or a dependency is missing, the host produces zero episodes and says so loudly. There is no "ship what we can" fallback.
- Do not write a label that an event didn't justify. Phase labels come from observed events, not from the schedule clock. See PIPELINE.md §4.5.
- Do not add a module to the catalog without verifying it lands a session against its declared target. See PIPELINE.md §4.3.
- Do not add per-host config overrides. One canonical manifest; hosts that can't run it produce nothing. See PIPELINE.md §4.1.
- Do not bypass the dirty-tree gate except via the
CIS490_ALLOW_DIRTY=1env var (logged, stamped, audited). No "skip preflight," noverify_tls=false, no other override knobs. - Do not run
openssl,step-cli, mint keys, or write CSRs. Cert delivery is automated. If you find yourself touching a private key on a lab host, stop. - Do not file a Forgejo issue without first running
cis490-doctorand pasting its output.
How a lab host gets to "shipping data"
This will be rewritten as PIPELINE.md §4 lands. The current
scripts/install-lab-host.sh does most of the right things but does
not yet enforce the canonical manifest, target-VM build, catalog
verification, or preflight. Until those land, treat the install
script as in-flight and assume a fresh lab host will produce nothing
until the bar is met.
The bar (when in place) will be:
- Repo cloned to
/opt/cis490, working tree clean, HEAD onorigin/main. - Every binary in the active collector + module catalog set on
PATH. - Every target VM image built from the in-repo spec, sha256-pinned.
- Every module in the catalog passes
scripts/verify-catalog.shagainst its target. - Every collector in the active set passes its emit-test.
orchestrator/preflight.pyexits 0.
Once that's true, systemctl enable --now cis490-shipper cis490-orchestrator brings the host online. The orchestrator runs
the canonical experiment; the shipper PUTs sealed episodes to the
receiver. Episodes that don't pass the acceptance gate go to
data/rejected/<id>/ locally and are never shipped.
Securing the connection (mTLS) — DO NOT mint your own certs
The lab-host ↔ Pi connection is mTLS over WireGuard. Cert delivery
is automated via bootstrap.wg/v1/cert/<host_id>. You should never
run openssl, write a CSR, edit a Caddyfile, or generate a private
key on the lab host. If you find yourself doing any of that, you're
off the runbook.
The most common reason cert fetch appears to fail is host_id still
being REPLACE_ME in /etc/cis490/lab-host.toml. Check that first.
The shipper's waiting on mTLS material log line is expected
during first-boot until the cert lands. It is not an error. The
transport rebuilds the SSL context on each request, so the moment
certs land in /etc/cis490/certs/, the next attempt succeeds — no
restart needed.
Filing issues
When you run into an issue you cannot fully resolve in the current turn, file it as a Forgejo issue on the relevant repo. Do not silently log a TODO comment, leave a partial workaround, or assume someone else will remember.
File issues for:
- A build / test / typecheck failure you can't fix in scope.
- A bug you discover but aren't tasked with fixing.
- A missing dep, missing config, or env-only failure that blocks E2E.
- A design gap you've worked around but want a follow-up to fix properly.
Don't file when:
- The user is in the conversation and you can just tell them.
- It's already filed (search first:
GET /api/v1/repos/<owner>/<repo>/issues?state=open&q=<keyword>). - It's truly a non-issue (a one-line edit you're about to make this same turn).
How to file (Forgejo API)
curl -s -X POST \
-H "Authorization: token <TOKEN>" \
-H "Content-Type: application/json" \
http://10.100.0.1:3000/api/v1/repos/spectral/<repo>/issues \
-d '{
"title": "<short, action-oriented title>",
"body": "<context, repro, attempted fixes, suggested next step>"
}'
The token comes from the user's session — never embed one in code or commits.
Good issue body
- Context — one sentence on what was being attempted.
- What happened — the actual error or unexpected behavior. Paste exact output.
- What was tried — every workaround you attempted and why it didn't stick.
- Suggested next step — the smallest change that would resolve it, if you have a guess. "Unknown" is fine.
- Related — link the commit / PR / file:line where the issue surfaced.
Good titles
| Bad | Good |
|---|---|
tests broken |
tests/test_episode.py: race when t_mono_origin_ns is set in run() not __init__ |
caddy thing |
Caddy: client_auth requires absolute path; relative trusted_ca_cert_file silently fails |
fix later |
shipper: 5xx backoff cap is 5min, doc says 1min — pick one |
After filing, reference the issue in the next commit message:
Refs spectral/<repo>#<n> or Closes spectral/<repo>#<n>. Fully
qualify cross-repo: spectral/wg-pki#3.
Other conventions
- Don't put off the hard parts. "Deferred-with-reason" is only for genuine blockers (binary not present on this machine, external service unreachable). For anything you could do but find awkward — bridge setup, cross-arch quirks, fleet concurrency — do it.
- No architectural bandaids in the pipeline. Compensating layers (auto-update timers, fix-yourself decision trees, prescriptive symptom→command tables, trainer-side prune scripts that paper over silent collectors) are not allowed in the data-collection / emulation / labelling path. Fix the producer instead. See PIPELINE.md.
- Naming: never coin USB / device / service names on the user's behalf. Ask first. Reusing an old name is especially bad.
/etcconfigs:Readfirst, copy second. Never overwrite a/etc/...file from a template without checking what's actually there.- wg-enroll scope: creation-only. Don't add admin / service-activation features to it.
- Don't expand a project's binary name beyond its own boundary:
openclawis the queue/permissions binary inopenclaw-deploy. This repo iswg-enroll(or its caller). Don't conflate.