Two follow-ups from the post-cutover diagnosis:
1. version_gate: forgejo → local git fallback. If forgejo refresh
returns empty AND a local repo path is configured, retry against
`git log` from the local checkout. The receiver service runs on
the same Pi as forgejo, so a simultaneous restart used to leave
the gate's cache empty and reject every PUT with not-in-window.
Auto-detects /opt/cis490/.git when the operator hasn't set
local_repo_path explicitly — that path is always present on a
production receiver and ProtectSystem=strict still allows reads.
Logs `source=git-fallback` so this isn't silent.
2. shipper/queue: sweep orphaned outbox tarballs. The lifecycle
invariant is `outbox/<id>.tar.zst exists ⇒ episodes/<id>/ exists`
— broken historically by the now-fixed fatal-loop, by operator
`rm` of an episode dir, or by an OS crash between rename(2) and
the post-ship cleanup. Without sweeping, dead bytes pile up
forever. New _sweep_outbox runs at the start of every scan,
bounded by the file count in outbox/.
Tests cover: fallback fires when forgejo unreachable + repo_path set;
no fallback when repo_path None (opt-in); orphan tarball + partial
get swept on the next pass; live tarballs untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Initial git-log-based gate ran into a permission wall: the cis490
service user can't read /home/max/cis490/.git (ProtectHome=true +
home-dir mode). Switching the production source to the local Forgejo
HTTP API (already accessible to all WG peers, single source of truth
both lab hosts and the receiver pull from). When the maintainer
pushes new code to spectral/CIS490, the next 5-second cache refresh
sees the new commit and lab hosts can immediately ship under it.
VersionGate now takes either:
- forgejo_url + repo_owner + repo_name + branch (+ optional
auth_token for private repos): hits
/api/v1/repos/<owner>/<name>/commits?sha=<branch>&limit=<n>
- repo_path: dev-only fallback, runs `git log` locally
Local-git path retained for tests + the dev-only case.
receiver.toml.example gains forgejo_url/repo_owner/repo_name/branch
with auth_token commented; live-deployed receiver.toml on the Pi has
the spectral org + token.
Live state on the Pi: 41 valid hashes loaded, head=f8ad02b. Verified
end-to-end:
bogus commit → 412 + remediation
HEAD commit → clears gate (fails downstream at sha-mismatch as
expected for the empty-body verify probe)
Test added: test_forgejo_backend_accepts_returned_commits stands up
a tiny canned-response HTTPServer in-process, exercises the parser
without depending on a live Forgejo instance. Brings test_version_gate
to 10 cases; total 158/158.
Stops out-of-date lab hosts from polluting the dataset with episodes
generated by buggy code. The valid-commits set mirrors the maintainer's
working clone on the Pi automatically — when the maintainer pulls or
pushes a new commit, the receiver picks it up within the 5-second
cache TTL with no service restart.
Receiver changes:
- receiver/version_gate.py (new): VersionGate(repo_path, window).
Each check() consults a frozenset of the last `window` commit
hashes from `git -C <repo> log --format=%H -n <window>`, refreshed
every 5s under a lock. Resilient to transient git failure (keeps
prior cache so a flaky `git` doesn't lock out every shipper).
- receiver/app.py: PUT extracts X-Cis490-Code-Commit; gate.check()
before ingest. Rejects with:
400 + remediation if header missing or malformed
412 + remediation + your_commit + head_commit if not in window
Remediation block is verbatim copy-pasteable into the lab-host
shell:
cd /opt/cis490 && sudo -u cis490 git pull origin main
sudo /opt/cis490/scripts/install-lab-host.sh
sudo systemctl restart cis490-orchestrator
- receiver/store.py: ingest_stream takes commit kwarg, stamps it on
the index.jsonl row (new optional field). Backfilled rows from
index_backfill.py also pull commit out of meta.json.
- receiver/config.py + etc/receiver.toml.example: new [version_gate]
section. enabled=true, repo_path=/home/max/cis490, window=100 by
default. Enabled toggle exists for emergency disable-and-collect.
Shipper changes:
- shipper/transport.py: ship_tarball() takes commit kwarg, sends
X-Cis490-Code-Commit header. 412 maps to status='fatal' so the
queue doesn't infinite-retry — operator must pull and reinstall
before the next ship will succeed.
- shipper/queue.py: reads meta.json::code_version.commit per
episode, passes through. On 412, logs the receiver's full
remediation block at ERROR level so journalctl on the lab host
shows exactly what to run.
Tests: 9 in test_version_gate (including 2 end-to-end via
starlette.testclient), 2 cover the boundary where new commits land
mid-cache and where missing-repo gracefully keeps prior cache.
157/157 total.
Index schema: existing rows stay valid (commit field is optional
on read). New rows from receiver-direct AND from index_backfill.py
include commit.