CIS490/etc/cis490-autoupdate.timer
max 98dcd4f9f8 lab-host: cis490-autoupdate.timer for self-healing on push
Today's incident: post-cutover, k-gamingcom went silent and
elliott-thinkpad kept shipping pre-stamp episodes that the receiver
gate 400'd in a 2300+ PUT loop. Both required `git pull && install-
lab-host.sh` *on the host* — neither the on-device AI agent nor the
operator pulled in time, and from the receiver Pi I cannot reach in
(sshd off on the lab hosts).

Fix the recurrence directly: a 30-min systemd timer that does
git fetch + (if behind) ff-only pull + re-run install-lab-host.sh.
Hosts catch up on the next tick on their own — no human or agent
action required.

Mechanics:
- scripts/auto-update.sh runs as root, drops to cis490 for git ops
  to satisfy /opt/cis490 ownership ("dubious ownership" guard).
- Refuses ff if local HEAD isn't an ancestor of origin/main —
  protects operator hand-edits from silent overwrite.
- Network failures exit 0 (offline is normal, don't pin a unit
  failure); divergence + install failures exit non-zero so the
  journal records what broke.
- RandomizedDelaySec=10min on the timer prevents thundering-herd
  when several hosts boot together.
- Hands off to install-lab-host.sh via exec — exactly one path
  through bring-up; no special "auto" flow.

The version-gate provides the quality boundary, so even if origin/
main moves forward unsafely, the receiver's allow-list still
controls what lands in the index.

install-lab-host.sh enables cis490-autoupdate.timer on every run,
idempotent — existing hosts pick it up the next time they pull
manually.

Filed Forgejo #18 with the canonical command for elliott-thinkpad
+ k-gamingcom to bootstrap themselves out of the current incident
(auto-update doesn't help them retroactively — it has to be running
*before* the cutover to catch the next one).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:59:31 -05:00

18 lines
555 B
SYSTEMD

[Unit]
Description=Run CIS490 lab-host auto-update every 30 minutes
Documentation=https://maxgit.wg/spectral/CIS490
[Timer]
# 5 min after boot so a freshly-flashed host catches up promptly.
OnBootSec=5min
# Then every 30 min. RandomizedDelaySec spreads across hosts so
# the receiver / forgejo aren't hammered all at once when several
# lab hosts come up together.
OnUnitActiveSec=30min
RandomizedDelaySec=10min
# If the host was off when a tick was due, run on next boot.
Persistent=true
Unit=cis490-autoupdate.service
[Install]
WantedBy=timers.target