The auto-update timer (98dcd4f) covers the routine case of a host falling behind origin/main. It deliberately refuses to fast-forward when local HEAD isn't an ancestor of origin/main — the right call for safety, but it leaves on-device agents with no automatic path out when they (or an operator) made a local commit. That's exactly the elliott-thinkpad incident: ~31,738 episodes shipped over 19 hours, all stamped with local commit5568d77that isn't on origin/main, all 412'd. Auto-update can't fix it; the on-device agent had no doc telling it what to do. FIXYOURSELF.md is that doc. Pure decision tree, six branches (behind / diverged / no-network / no-git / dirty-tree / clean) each with verbatim commands and the order to try them. The diverged-HEAD branch (§B) is the elliott-thinkpad case and offers three resolutions (push, reset, file-issue-and-wait) so an agent that doesn't have push permission isn't backed into discarding work. Linked from the AGENTS.md top-of-file symptom table so a smaller model finds it without having to know the filename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.3 KiB
FIXYOURSELF.md — when the lab host can't unstick itself automatically
You are reading this because you (an on-device AI agent) are running on a CIS490 lab host and one of these is true:
cis490-autoupdate.timerkeeps failing.- The shipper logs show steady
412 commit-rejectedor400 missing. cis490-doctor --role lab-hostflagged you red.
The normal "pull and reinstall" path (AGENTS.md row 1) sometimes can't
recover on its own — most often because someone made a local commit
on this host, or /opt/cis490 got into a state git pull --ff-only
refuses to fix. This doc is the explicit decision tree for those
cases. Run the commands literally. Do not paraphrase.
Step 1 — Identify your state
Run all of these. Save the output. Match the result against the table.
cd /opt/cis490
sudo -u cis490 git rev-parse HEAD # → LOCAL
sudo -u cis490 git rev-parse origin/main 2>&1 # → REMOTE (or error)
sudo -u cis490 git fetch origin main 2>&1 # → fresh fetch result
sudo -u cis490 git log -1 --format='%H %ci %s' # → what HEAD is
sudo -u cis490 git status --porcelain # → uncommitted changes?
Then:
| If you see… | Your state | Go to |
|---|---|---|
LOCAL == REMOTE and git status empty |
Not stuck — run §Z to be safe | §Z |
LOCAL != REMOTE and git merge-base --is-ancestor HEAD origin/main returns 0 |
Behind main, no local commits | §A |
| LOCAL != REMOTE and the merge-base check returns 1 (NON-zero) | You have a local commit not on origin/main | §B |
git fetch prints a network error |
Connectivity broken | §C |
/opt/cis490/.git is missing |
No git checkout — populated via cp -aT originally |
§D |
git status shows tracked files modified |
Uncommitted edits on this host | §E |
If multiple match: §C blocks everything else (fix network first), then §D, then §E, then §B, then §A.
§A — Behind main, clean tree
cd /opt/cis490
sudo -u cis490 git pull --ff-only origin main
sudo /opt/cis490/scripts/install-lab-host.sh
install-lab-host.sh re-stamps VERSION, drains the pre-stamp queue,
and restarts the daemons. Verify with §Z.
§B — You have a local commit not on origin/main
This is the elliott-thinkpad case (2026-05-01..02). You committed
something locally, the maintainer's origin/main doesn't have it,
and the receiver's allow-list rejects every episode you ship. Pick
ONE of B.1, B.2, B.3 — read all three first.
B.1 — Push your commit to origin/main (RECOMMENDED if your change is real)
cd /opt/cis490
sudo -u cis490 git log -3 --stat HEAD # what you're about to push — read it
sudo -u cis490 git push origin HEAD:main
If git push succeeds: the receiver allow-list picks it up within 5
seconds, new episodes start landing immediately. Verify with §Z.
If git push fails with "permission denied" or "auth required": you
don't have push credentials. Skip to B.2 or B.3.
If git push fails with "non-fast-forward": your local commit
diverged AROUND origin/main (you committed against an older base).
Try:
sudo -u cis490 git pull --rebase origin main
# If the rebase has conflicts you can't resolve in 60 seconds, abort:
# sudo -u cis490 git rebase --abort
# Then go to B.2 or B.3.
sudo -u cis490 git push origin HEAD:main
B.2 — Discard your local commit, reset to origin
⚠ This deletes your local commit. Only do this if you know it wasn't a real fix or the maintainer told you to discard it.
cd /opt/cis490
sudo -u cis490 git fetch origin main
sudo -u cis490 git log --oneline HEAD ^origin/main # what you're about to lose
sudo -u cis490 git reset --hard origin/main
sudo /opt/cis490/scripts/install-lab-host.sh
B.3 — Stop, file an issue, wait
If you can't decide between B.1 and B.2 (e.g. you don't know if your commit is a real fix), do this:
cd /opt/cis490
HOST_ID=$(grep '^host_id' /etc/cis490/lab-host.toml | cut -d'"' -f2)
LOCAL_SHA=$(sudo -u cis490 git rev-parse HEAD)
DETAIL=$(sudo -u cis490 git log -3 --stat HEAD | head -100)
# File the issue (replace <TOKEN> with the operator's Forgejo token —
# do NOT embed yours in commits)
curl -sS -X POST \
-H "Authorization: token <TOKEN>" \
-H "Content-Type: application/json" \
http://10.100.0.1:3000/api/v1/repos/spectral/CIS490/issues \
-d "$(python3 -c "import json,os; print(json.dumps({
'title': f\"$HOST_ID: stuck on local commit ${LOCAL_SHA:0:12}\",
'body': f\"### What's at HEAD\n\n\`\`\`\n${DETAIL}\n\`\`\`\n\nNeed maintainer to choose: push HEAD to main, or reset --hard origin/main here?\"
}))")"
Then leave the daemons running. The shipper will keep auto-quarantining the 412s — backlog grows but doesn't crash anything. Wait for a maintainer comment.
§C — Network broken
ping -c 1 10.100.0.1 # the Pi
sudo wg show # is wg0 up?
sudo systemctl restart wg-quick@wg0 # bring it back up
sudo systemctl restart cis490-shipper cis490-orchestrator
If ping 10.100.0.1 still fails after a wg-quick restart, this is
a WireGuard / wg-enroll / iptmonads problem outside this repo. File
an issue at spectral/wg-enroll or spectral/iptmonads and stop.
§D — /opt/cis490/.git missing
The host was originally set up with cp -aT (no .git/). That makes
auto-update impossible. Re-clone:
# Stop services so we don't race with the orchestrator mid-episode
sudo systemctl stop cis490-shipper cis490-orchestrator
# Preserve config/data — only /opt/cis490 (the code) gets replaced.
# /etc/cis490/ and /var/lib/cis490/ are NOT touched.
sudo mv /opt/cis490 /opt/cis490.pre-fix
sudo git clone http://maxgit.wg:3000/spectral/CIS490.git /opt/cis490
sudo chown -R cis490:cis490 /opt/cis490
sudo /opt/cis490/scripts/install-lab-host.sh
# Once verified, you can drop the backup:
# sudo rm -rf /opt/cis490.pre-fix
§E — Uncommitted edits on tracked files
cd /opt/cis490
sudo -u cis490 git status --short # see what's modified
sudo -u cis490 git diff # see exactly what changed
If the changes are intentional (e.g. you fixed a bug), commit them first and then go to §B:
sudo -u cis490 git add <files>
sudo -u cis490 git commit -m "<short description>"
# Now go to §B.
If the changes are accidental / left over from debugging, discard them:
sudo -u cis490 git checkout -- .
# Now go to §A.
§Z — Verify you're unstuck
# 1. Daemons up?
systemctl is-active cis490-shipper cis490-orchestrator
# Both should say "active".
# 2. VERSION present and matches HEAD?
cat /opt/cis490/VERSION
sudo -u cis490 git -C /opt/cis490 rev-parse HEAD
# The "commit" in VERSION should equal the rev-parse output.
# 3. Recent ship results — should NOT be 412/400-fatal-only:
sudo journalctl -u cis490-shipper --since '2 minutes ago' | tail -20
# 4. Maintainer-side confirmation (if you can reach the Pi):
# sudo tail -5 /var/lib/cis490/index.jsonl
# Should show recent received_at_wall timestamps with your host_id.
# 5. Run the doctor — every red row prints its own fix:
/opt/cis490/.venv/bin/python /opt/cis490/tools/cis490_doctor.py --role lab-host
Hard rules (don't break these)
These are repeated from AGENTS.md because the situations covered in this doc are exactly when an agent is most tempted to break them.
- Do NOT edit
meta.jsonfiles to add a fakecode_version. The gate exists to keep buggy data out of the index. - Do NOT disable / mask /
systemctl stopcis490-shipperto silence log noise. Fix the cause. - Do NOT set
verify_tls = falseinlab-host.toml. That's a bypass, not a fix. - Do NOT
git commiton/opt/cis490to "make the gate happy" — committing without pushing is exactly how the elliott-thinkpad incident started. If you commit, push (§B.1) or don't commit at all. - Do NOT delete
/var/lib/cis490/data/quarantine/. Those are forensic records of what the gate rejected; they're capped at 30 days by the cleanup pass. - Do NOT clobber
/etc/cis490/certs/. Cert delivery is automated; reruninstall-lab-host.shif certs are missing.
If you find yourself wanting to do any of the above, stop and file an issue (§B.3 has the curl command). The maintainer would much rather resolve a stale lab host by reading an issue than by reverse-engineering what an agent did to escape a stuck state.