AGENTS.md: how to recover from 400/412 commit-rejected loops
Smaller models running as on-device agents need a direct, prescriptive remediation block for the gate-failure modes — the receiver's response body is good but only visible if the agent reads journalctl carefully. Document the exact sequence (git pull → install-lab-host.sh) and what the install script now does on its own (drain pre-stamp queue, restart services). Also calls out the two anti-patterns we don't want agents trying: silencing the shipper to stop log noise, or fabricating a code_version field to bypass the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
eda6164897
commit
f294e97875
1 changed files with 38 additions and 0 deletions
38
AGENTS.md
38
AGENTS.md
|
|
@ -78,6 +78,44 @@ common silent failures it catches:
|
|||
|
||||
`cis490-doctor --json` is machine-readable for use by other agents.
|
||||
|
||||
## Shipper says "400 missing" or "412 commit-rejected": pull and reinstall
|
||||
|
||||
If `journalctl -u cis490-shipper` shows a steady stream of
|
||||
`-> fatal (400)` or `-> 412 commit-rejected` lines, the receiver is
|
||||
rejecting episodes because their `meta.json::code_version.commit`
|
||||
isn't in the receiver's allow-list (or isn't being sent at all). This
|
||||
happens when this lab host is running code older than the receiver
|
||||
will accept.
|
||||
|
||||
The fix is always the same — pull main and re-run the installer:
|
||||
|
||||
```sh
|
||||
cd /opt/cis490
|
||||
sudo -u cis490 git pull origin main
|
||||
sudo /opt/cis490/scripts/install-lab-host.sh
|
||||
```
|
||||
|
||||
`install-lab-host.sh` does the rest:
|
||||
1. Re-stamps `/opt/cis490/VERSION` to the new HEAD.
|
||||
2. Drains pre-stamp episodes via
|
||||
`tools/quarantine_unstamped.py` so the queue stops looping on
|
||||
them. Drained episodes go to `/var/lib/cis490/data/quarantine/`
|
||||
with a `quarantine_reason.json` per-episode for triage.
|
||||
3. Restarts `cis490-shipper` and `cis490-orchestrator` so the new code
|
||||
takes effect.
|
||||
|
||||
Do **not** disable the shipper to silence the log noise — once a host
|
||||
has the new code, traffic resumes immediately. Do **not** mint a fake
|
||||
`code_version` field in old episodes to bypass the gate; that field
|
||||
exists specifically to keep buggy pre-fix data out of the training
|
||||
index.
|
||||
|
||||
If the receiver is rejecting *new* episodes too (you've pulled and
|
||||
restarted, but still see 412), the receiver's allow-list window may
|
||||
not yet include your commit — wait 5s for its Forgejo refresh, or
|
||||
push your commit to `origin/main` first if you're testing
|
||||
unmerged work.
|
||||
|
||||
## Tier 3 + Tier 4 deploy (zero-touch via install-lab-host.sh)
|
||||
|
||||
`install-lab-host.sh` runs Tier-3 deploy automatically on its second
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue