# FIXYOURSELF.md — when the lab host can't unstick itself automatically **You are reading this because you (an on-device AI agent) are running on a CIS490 lab host and one of these is true:** - `cis490-autoupdate.timer` keeps failing. - The shipper logs show steady `412 commit-rejected` or `400 missing`. - `cis490-doctor --role lab-host` flagged you red. The normal "pull and reinstall" path (AGENTS.md row 1) sometimes can't recover on its own — most often because someone made a local commit on this host, or `/opt/cis490` got into a state `git pull --ff-only` refuses to fix. This doc is the explicit decision tree for those cases. Run the commands literally. Do not paraphrase. --- ## Step 1 — Identify your state Run all of these. Save the output. Match the result against the table. ```sh cd /opt/cis490 sudo -u cis490 git rev-parse HEAD # → LOCAL sudo -u cis490 git rev-parse origin/main 2>&1 # → REMOTE (or error) sudo -u cis490 git fetch origin main 2>&1 # → fresh fetch result sudo -u cis490 git log -1 --format='%H %ci %s' # → what HEAD is sudo -u cis490 git status --porcelain # → uncommitted changes? ``` Then: | If you see… | Your state | Go to | |---|---|---| | LOCAL == REMOTE and `git status` empty | Not stuck — run §Z to be safe | §Z | | LOCAL != REMOTE and `git merge-base --is-ancestor HEAD origin/main` returns 0 | Behind main, no local commits | §A | | LOCAL != REMOTE and the merge-base check returns 1 (NON-zero) | **You have a local commit not on origin/main** | §B | | `git fetch` prints a network error | Connectivity broken | §C | | `/opt/cis490/.git` is missing | No git checkout — populated via `cp -aT` originally | §D | | `git status` shows tracked files modified | Uncommitted edits on this host | §E | If multiple match: §C blocks everything else (fix network first), then §D, then §E, then §B, then §A. --- ## §A — Behind main, clean tree ```sh cd /opt/cis490 sudo -u cis490 git pull --ff-only origin main sudo /opt/cis490/scripts/install-lab-host.sh ``` `install-lab-host.sh` re-stamps VERSION, drains the pre-stamp queue, and restarts the daemons. Verify with §Z. --- ## §B — You have a local commit not on origin/main This is the elliott-thinkpad case (2026-05-01..02). You committed something locally, the maintainer's `origin/main` doesn't have it, and the receiver's allow-list rejects every episode you ship. Pick ONE of B.1, B.2, B.3 — read all three first. ### B.1 — Push your commit to origin/main (RECOMMENDED if your change is real) ```sh cd /opt/cis490 sudo -u cis490 git log -3 --stat HEAD # what you're about to push — read it sudo -u cis490 git push origin HEAD:main ``` If `git push` succeeds: the receiver allow-list picks it up within 5 seconds, new episodes start landing immediately. Verify with §Z. If `git push` fails with "permission denied" or "auth required": you don't have push credentials. Skip to B.2 or B.3. If `git push` fails with "non-fast-forward": your local commit diverged AROUND origin/main (you committed against an older base). Try: ```sh sudo -u cis490 git pull --rebase origin main # If the rebase has conflicts you can't resolve in 60 seconds, abort: # sudo -u cis490 git rebase --abort # Then go to B.2 or B.3. sudo -u cis490 git push origin HEAD:main ``` ### B.2 — Discard your local commit, reset to origin ⚠ **This deletes your local commit.** Only do this if you know it wasn't a real fix or the maintainer told you to discard it. ```sh cd /opt/cis490 sudo -u cis490 git fetch origin main sudo -u cis490 git log --oneline HEAD ^origin/main # what you're about to lose sudo -u cis490 git reset --hard origin/main sudo /opt/cis490/scripts/install-lab-host.sh ``` ### B.3 — Stop, file an issue, wait If you can't decide between B.1 and B.2 (e.g. you don't know if your commit is a real fix), do this: ```sh cd /opt/cis490 HOST_ID=$(grep '^host_id' /etc/cis490/lab-host.toml | cut -d'"' -f2) LOCAL_SHA=$(sudo -u cis490 git rev-parse HEAD) DETAIL=$(sudo -u cis490 git log -3 --stat HEAD | head -100) # File the issue (replace with the operator's Forgejo token — # do NOT embed yours in commits) curl -sS -X POST \ -H "Authorization: token " \ -H "Content-Type: application/json" \ http://10.100.0.1:3000/api/v1/repos/spectral/CIS490/issues \ -d "$(python3 -c "import json,os; print(json.dumps({ 'title': f\"$HOST_ID: stuck on local commit ${LOCAL_SHA:0:12}\", 'body': f\"### What's at HEAD\n\n\`\`\`\n${DETAIL}\n\`\`\`\n\nNeed maintainer to choose: push HEAD to main, or reset --hard origin/main here?\" }))")" ``` Then leave the daemons running. The shipper will keep auto-quarantining the 412s — backlog grows but doesn't crash anything. Wait for a maintainer comment. --- ## §C — Network broken ```sh ping -c 1 10.100.0.1 # the Pi sudo wg show # is wg0 up? sudo systemctl restart wg-quick@wg0 # bring it back up sudo systemctl restart cis490-shipper cis490-orchestrator ``` If `ping 10.100.0.1` still fails after a `wg-quick` restart, this is a WireGuard / wg-enroll / iptmonads problem outside this repo. File an issue at `spectral/wg-enroll` or `spectral/iptmonads` and stop. --- ## §D — `/opt/cis490/.git` missing The host was originally set up with `cp -aT` (no `.git/`). That makes auto-update impossible. Re-clone: ```sh # Stop services so we don't race with the orchestrator mid-episode sudo systemctl stop cis490-shipper cis490-orchestrator # Preserve config/data — only /opt/cis490 (the code) gets replaced. # /etc/cis490/ and /var/lib/cis490/ are NOT touched. sudo mv /opt/cis490 /opt/cis490.pre-fix sudo git clone http://maxgit.wg:3000/spectral/CIS490.git /opt/cis490 sudo chown -R cis490:cis490 /opt/cis490 sudo /opt/cis490/scripts/install-lab-host.sh # Once verified, you can drop the backup: # sudo rm -rf /opt/cis490.pre-fix ``` --- ## §E — Uncommitted edits on tracked files ```sh cd /opt/cis490 sudo -u cis490 git status --short # see what's modified sudo -u cis490 git diff # see exactly what changed ``` If the changes are intentional (e.g. you fixed a bug), commit them first and then go to §B: ```sh sudo -u cis490 git add sudo -u cis490 git commit -m "" # Now go to §B. ``` If the changes are accidental / left over from debugging, discard them: ```sh sudo -u cis490 git checkout -- . # Now go to §A. ``` --- ## §Z — Verify you're unstuck ```sh # 1. Daemons up? systemctl is-active cis490-shipper cis490-orchestrator # Both should say "active". # 2. VERSION present and matches HEAD? cat /opt/cis490/VERSION sudo -u cis490 git -C /opt/cis490 rev-parse HEAD # The "commit" in VERSION should equal the rev-parse output. # 3. Recent ship results — should NOT be 412/400-fatal-only: sudo journalctl -u cis490-shipper --since '2 minutes ago' | tail -20 # 4. Maintainer-side confirmation (if you can reach the Pi): # sudo tail -5 /var/lib/cis490/index.jsonl # Should show recent received_at_wall timestamps with your host_id. # 5. Run the doctor — every red row prints its own fix: /opt/cis490/.venv/bin/python /opt/cis490/tools/cis490_doctor.py --role lab-host ``` --- ## Hard rules (don't break these) These are repeated from AGENTS.md because the situations covered in this doc are exactly when an agent is most tempted to break them. - **Do NOT** edit `meta.json` files to add a fake `code_version`. The gate exists to keep buggy data out of the index. - **Do NOT** disable / mask / `systemctl stop` `cis490-shipper` to silence log noise. Fix the cause. - **Do NOT** set `verify_tls = false` in `lab-host.toml`. That's a bypass, not a fix. - **Do NOT** `git commit` on `/opt/cis490` to "make the gate happy" — committing without pushing is exactly how the elliott-thinkpad incident started. If you commit, push (§B.1) or don't commit at all. - **Do NOT** delete `/var/lib/cis490/data/quarantine/`. Those are forensic records of what the gate rejected; they're capped at 30 days by the cleanup pass. - **Do NOT** clobber `/etc/cis490/certs/`. Cert delivery is automated; rerun `install-lab-host.sh` if certs are missing. If you find yourself wanting to do any of the above, stop and file an issue (§B.3 has the curl command). The maintainer would much rather resolve a stale lab host by reading an issue than by reverse-engineering what an agent did to escape a stuck state.