Today's incident: post-cutover, k-gamingcom went silent and
elliott-thinkpad kept shipping pre-stamp episodes that the receiver
gate 400'd in a 2300+ PUT loop. Both required `git pull && install-
lab-host.sh` *on the host* — neither the on-device AI agent nor the
operator pulled in time, and from the receiver Pi I cannot reach in
(sshd off on the lab hosts).
Fix the recurrence directly: a 30-min systemd timer that does
git fetch + (if behind) ff-only pull + re-run install-lab-host.sh.
Hosts catch up on the next tick on their own — no human or agent
action required.
Mechanics:
- scripts/auto-update.sh runs as root, drops to cis490 for git ops
to satisfy /opt/cis490 ownership ("dubious ownership" guard).
- Refuses ff if local HEAD isn't an ancestor of origin/main —
protects operator hand-edits from silent overwrite.
- Network failures exit 0 (offline is normal, don't pin a unit
failure); divergence + install failures exit non-zero so the
journal records what broke.
- RandomizedDelaySec=10min on the timer prevents thundering-herd
when several hosts boot together.
- Hands off to install-lab-host.sh via exec — exactly one path
through bring-up; no special "auto" flow.
The version-gate provides the quality boundary, so even if origin/
main moves forward unsafely, the receiver's allow-list still
controls what lands in the index.
install-lab-host.sh enables cis490-autoupdate.timer on every run,
idempotent — existing hosts pick it up the next time they pull
manually.
Filed Forgejo #18 with the canonical command for elliott-thinkpad
+ k-gamingcom to bootstrap themselves out of the current incident
(auto-update doesn't help them retroactively — it has to be running
*before* the cutover to catch the next one).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 lines
802 B
Desktop File
21 lines
802 B
Desktop File
[Unit]
|
|
Description=CIS490 lab-host auto-update from origin/main
|
|
Documentation=https://maxgit.wg/spectral/CIS490
|
|
After=network-online.target wg-quick@wg0.service
|
|
# We don't Want network-online so that a host that's offline just
|
|
# silently skips the update tick instead of pinning a unit failure.
|
|
|
|
[Service]
|
|
Type=oneshot
|
|
# Runs as root because install-lab-host.sh writes to /etc/, /opt/, and
|
|
# calls systemctl. The script drops to the cis490 user via `sudo -u`
|
|
# for the git fetch + pull.
|
|
ExecStart=/opt/cis490/scripts/auto-update.sh
|
|
StandardOutput=journal
|
|
StandardError=journal
|
|
|
|
[Install]
|
|
# The TIMER is what gets enabled, not the service itself. We still set
|
|
# WantedBy here so that an operator can `systemctl start
|
|
# cis490-autoupdate.service` manually for a one-shot pull.
|
|
WantedBy=multi-user.target
|