AGENTS.md: prescriptive guidance for smaller models on lab hosts

Smaller (non-4.7) Claude models act as on-device agents on CIS490 lab
hosts and have hit the install gotchas that became issues #10–#12.
Their reports describe symptoms well but miss inferred context — so
this expands the runbook with explicit "do this, not that" notes:

- run tools from /opt/cis490 not a clone (CWD-on-sys.path trap)
- shipper "waiting on mTLS material" is expected and self-heals; do
  not try to fix it manually
- table of the three install bugs already closed in main, so a fresh
  agent can recognize the symptom and pull instead of re-filing
- "fix one red row at a time" rather than batching attempts

Closes nothing new; this is the followup to #10/#11/#12 promised
during their resolution.
This commit is contained in:
max 2026-04-30 16:19:09 -05:00
parent 86a088c204
commit c80a36d3ae

View file

@ -66,6 +66,71 @@ common silent failures it catches:
`cis490-doctor --json` is machine-readable for use by other agents.
## Common bring-up gotchas (read this before debugging an install)
Smaller models acting as on-device agents have hit these traps. Each
one is now fixed in main, but if you're on an older clone you may
still see the symptom — pull `origin/main` first, then re-read.
### Run tools from `/opt/cis490`, not from a manual clone
When you run `cis490-doctor` from a clone like `~/.env/CIS490/`,
Python prepends the clone path to `sys.path`. Subprocesses spawned
by the doctor (e.g., `python -m shipper --ping`) inherit the calling
CWD and pick up the clone's `shipper/` package instead of the
service venv at `/opt/cis490/`. Symptom: tracebacks reference the
clone path, or `No module named exploits` despite `package = false`.
**Fix already in main:** the doctor passes `cwd=/opt/cis490` to the
shipper subprocess and inserts `repo_root` into `sys.path` itself.
**Operator action:** always invoke either as
`/opt/cis490/.venv/bin/python /opt/cis490/tools/cis490_doctor.py`
or via `cd /opt/cis490 && ./tools/cis490_doctor.py`. Don't run from a
clone unless you know what you're doing.
### Shipper logs "waiting on mTLS material" — this is expected, not a bug
The `cis490-shipper` unit is enabled by `install-lab-host.sh` *before*
the Pi has issued the host's mTLS leaf. The transport pre-flights the
configured `ca_bundle` / `client_cert` / `client_key` paths and, if
any are missing, defers building the SSL context. You'll see one
warning per process lifetime:
```
shipper waiting on mTLS material (client_cert path missing: …); will retry each request
```
The unit stays up. Each ping/ship attempt re-tries the build. Once
the Pi runs `deploy-cis490-cert.sh <host_id> <wg_ip>` and the leaf
lands at `/etc/cis490/certs/`, the next request succeeds and the
transport logs `mTLS material now on disk; shipper transport ready`.
**Do not** try to "fix" the warning by restarting the unit, deleting
the config, or hand-rolling certs — just confirm the Pi-side step
ran and wait one scan interval.
### `install-lab-host.sh` failures
Three install bugs were fixed in commit `95ac56a`. If you're on an
older clone:
| Symptom | Cause | Fix |
|---|---|---|
| `ModuleNotFoundError: pycdlib` during cidata build | `pycdlib` was in `dev` deps, service venv only installs main deps | Pull main; `pycdlib` is in `dependencies` now |
| Episodes exit `rc=1` in 15 s; `launch_demo.sh` can't find image | `vm/images/` dir wasn't created before symlinking | Pull main; install script now `install -d`'s the directory |
| `cis490-doctor` reports "tier3: No module named exploits" | `sys.path` didn't include repo root | Pull main; doctor inserts `repo_root` into `sys.path` |
**If you hit any of these on a fresh install, pull main first** before
filing an issue — the issue is probably already closed.
### One traceback at a time
When the doctor lights up multiple red rows, fix the topmost one and
re-run rather than batching attempts. Each red row prints the exact
operator command it expects you to run. Don't paraphrase or invent
adjacent commands; the doctor is the source of truth for what's
missing.
## How an agent generates data on demand (without waiting for the timer)
```sh