Today's incident: post-cutover, k-gamingcom went silent and
elliott-thinkpad kept shipping pre-stamp episodes that the receiver
gate 400'd in a 2300+ PUT loop. Both required `git pull && install-
lab-host.sh` *on the host* — neither the on-device AI agent nor the
operator pulled in time, and from the receiver Pi I cannot reach in
(sshd off on the lab hosts).
Fix the recurrence directly: a 30-min systemd timer that does
git fetch + (if behind) ff-only pull + re-run install-lab-host.sh.
Hosts catch up on the next tick on their own — no human or agent
action required.
Mechanics:
- scripts/auto-update.sh runs as root, drops to cis490 for git ops
to satisfy /opt/cis490 ownership ("dubious ownership" guard).
- Refuses ff if local HEAD isn't an ancestor of origin/main —
protects operator hand-edits from silent overwrite.
- Network failures exit 0 (offline is normal, don't pin a unit
failure); divergence + install failures exit non-zero so the
journal records what broke.
- RandomizedDelaySec=10min on the timer prevents thundering-herd
when several hosts boot together.
- Hands off to install-lab-host.sh via exec — exactly one path
through bring-up; no special "auto" flow.
The version-gate provides the quality boundary, so even if origin/
main moves forward unsafely, the receiver's allow-list still
controls what lands in the index.
install-lab-host.sh enables cis490-autoupdate.timer on every run,
idempotent — existing hosts pick it up the next time they pull
manually.
Filed Forgejo #18 with the canonical command for elliott-thinkpad
+ k-gamingcom to bootstrap themselves out of the current incident
(auto-update doesn't help them retroactively — it has to be running
*before* the cutover to catch the next one).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 lines
555 B
SYSTEMD
18 lines
555 B
SYSTEMD
[Unit]
|
|
Description=Run CIS490 lab-host auto-update every 30 minutes
|
|
Documentation=https://maxgit.wg/spectral/CIS490
|
|
|
|
[Timer]
|
|
# 5 min after boot so a freshly-flashed host catches up promptly.
|
|
OnBootSec=5min
|
|
# Then every 30 min. RandomizedDelaySec spreads across hosts so
|
|
# the receiver / forgejo aren't hammered all at once when several
|
|
# lab hosts come up together.
|
|
OnUnitActiveSec=30min
|
|
RandomizedDelaySec=10min
|
|
# If the host was off when a tick was due, run on next boot.
|
|
Persistent=true
|
|
Unit=cis490-autoupdate.service
|
|
|
|
[Install]
|
|
WantedBy=timers.target
|