docs+doctor: surface VERSION-stamp + fallback wiring

receiver.toml.example: the local_repo_path comment was wrong about
when it kicks in. With the new fallback path, it's used both when
forgejo_url is unset (sole backend) AND when forgejo is unreachable
(failover). Document that, plus the auto-detect of /opt/cis490/.git.

cis490_doctor: add a VERSION-stamp check for lab-host role. If
/opt/cis490/VERSION is missing or malformed, the orchestrator stamps
"unknown" → receiver gate rejects every PUT → quarantine. Surface
this as a red row with the canonical fix (re-run install-lab-host.sh)
so an on-device agent doesn't have to grep journal logs to figure it
out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
max 2026-05-01 11:54:36 -05:00
parent 5cebe7096a
commit ed5e6b0581
2 changed files with 52 additions and 2 deletions

View file

@ -33,5 +33,13 @@ branch = "main"
# Optional Forgejo token for private repos; remove for public. # Optional Forgejo token for private repos; remove for public.
# auth_token = "..." # auth_token = "..."
# #
# Dev-only fallback (used iff forgejo_url is unset): # Optional local-git fallback. When BOTH forgejo_url and
# local_repo_path = "/home/max/cis490" # local_repo_path are set, the gate first asks Forgejo; if that fails
# (e.g. simultaneous restart of receiver + Forgejo on the same Pi) it
# falls back to `git log` against this checkout instead of locking
# out every shipper. When forgejo_url is unset, this is the only
# backend.
#
# Auto-detected: if you don't set this, the receiver checks for
# /opt/cis490/.git at startup and uses that path when present.
# local_repo_path = "/opt/cis490"

View file

@ -202,6 +202,48 @@ def check_install(report: Report, role: str) -> None:
fix=f"sudo /opt/cis490/scripts/install-{role}.sh", fix=f"sudo /opt/cis490/scripts/install-{role}.sh",
)) ))
# VERSION file — written by install-lab-host.sh on every successful
# run. Its absence means the install never finished step 3, so the
# orchestrator falls back to git rev-parse (or "unknown" if no .git/
# is here either). Stamping "unknown" gets every episode rejected
# by the receiver gate as bad-format → drained to quarantine/. The
# fix is the same git-pull-and-reinstall as for stale code.
version_file = install_root / "VERSION"
if role == "lab-host" and _path_exists(version_file):
try:
v = json.loads(version_file.read_text())
commit = v.get("commit", "")
branch = v.get("branch", "?")
dirty = " [dirty]" if v.get("dirty") else ""
if isinstance(commit, str) and len(commit) == 40:
report.add(Check(
"install: VERSION stamp",
"ok",
detail=f"{branch}@{commit[:8]}{dirty}",
))
else:
report.add(Check(
"install: VERSION stamp",
"fail",
detail=f"commit field malformed: {commit!r}",
fix=f"sudo /opt/cis490/scripts/install-{role}.sh",
))
except (OSError, json.JSONDecodeError) as e:
report.add(Check(
"install: VERSION stamp",
"fail",
detail=f"unreadable: {e}",
fix=f"sudo /opt/cis490/scripts/install-{role}.sh",
))
elif role == "lab-host":
report.add(Check(
"install: VERSION stamp",
"fail",
detail="missing — orchestrator will stamp 'unknown' and the "
"receiver gate will reject every PUT",
fix=f"sudo /opt/cis490/scripts/install-{role}.sh",
))
cfg_name = "lab-host.toml" if role == "lab-host" else "receiver.toml" cfg_name = "lab-host.toml" if role == "lab-host" else "receiver.toml"
cfg = Path("/etc/cis490") / cfg_name cfg = Path("/etc/cis490") / cfg_name
if _path_exists(cfg): if _path_exists(cfg):