CIS490/shipper/__main__.py
max 7c9f9582ca Lab-host shipper + receiver /v1/ping + install scripts
Implements the deployment loop end-to-end on the CIS490 side:

shipper/
  config.py      ShipperConfig (host_id, paths, receiver endpoint, mTLS)
  transport.py   httpx-based PUT + ping with mTLS + bearer support
  queue.py       scan data/episodes/, tar+zstd via system zstd, ship,
                 retire to data/shipped/. Idempotent across crashes per
                 the state machine in docs/transport.md.
  __main__.py    CLI: --ping (smoke test), --once (one pass), or daemon

receiver/app.py: new POST /v1/ping that requires the same auth as PUT
  /v1/episodes but writes nothing. Used by `cis490-shipper --ping`
  during lab-host bring-up to verify the WG/Caddy/mTLS path before
  shipping any real bytes.

etc/
  cis490-shipper.service       systemd unit for the lab-host shipper
  cis490-orchestrator.service  systemd unit for the lab-host queue
                               (kept disabled by default until queue
                               mode lands)
  lab-host.toml.example        config template

scripts/
  install-lab-host.sh   idempotent installer; verifies prereqs,
                        creates cis490 service user, syncs repo to
                        /opt/cis490, builds venv, drops systemd units
                        and config template
  install-receiver.sh   same, for the receiver role on the central WG
                        node (Pi5 in our setup)

tests/test_shipper.py  11 end-to-end tests against a real Uvicorn
                       server hosting the receiver app. Exercises
                       ping, tar+ship, idempotent re-ship, 409
                       conflict, transient (receiver down), tarball
                       round-trip via system zstd.

AGENTS.md  guidance for AI agents working on this and sibling repos.
           Headline: when you hit an issue you can't fully fix in
           scope, file a Forgejo issue rather than leaving a TODO.

51/51 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:41:32 -05:00

106 lines
3.1 KiB
Python

"""``cis490-shipper`` CLI entrypoint.
Modes:
--ping hit /v1/ping; exit 0 if 200/ok, non-zero otherwise.
No tarball flow; index.jsonl on the receiver is untouched.
--once one scan pass over data/episodes/, ship anything done, exit.
(default) long-running daemon; rescans every scan_interval_s.
"""
from __future__ import annotations
import argparse
import json
import logging
import signal
import sys
from pathlib import Path
from .config import ShipperConfig
from .queue import ShipperQueue
from .transport import ShipperTransport
def _setup_logging(level: str) -> None:
logging.basicConfig(
level=getattr(logging, level.upper(), logging.INFO),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(prog="cis490-shipper")
parser.add_argument(
"--config",
default="/etc/cis490/lab-host.toml",
help="Path to lab-host config (TOML)",
)
parser.add_argument(
"--ping",
action="store_true",
help="Hit /v1/ping on the receiver and exit",
)
parser.add_argument(
"--once",
action="store_true",
help="One scan pass, then exit (default is long-running daemon)",
)
parser.add_argument("--log-level", default="INFO")
args = parser.parse_args(argv)
_setup_logging(args.log_level)
log = logging.getLogger("cis490.shipper")
try:
cfg = ShipperConfig.load(args.config)
except (FileNotFoundError, ValueError) as e:
log.error("config error: %s", e)
return 2
transport = ShipperTransport(cfg)
if args.ping:
result = transport.ping()
# Print structured one-liner for CI / test pipelines.
print(json.dumps({
"ok": result.ok,
"status_code": result.status_code,
"host_id": cfg.host_id,
"receiver": cfg.receiver.url,
"body": result.body,
"error": result.error,
}))
return 0 if result.ok else 1
queue = ShipperQueue(cfg, transport)
if args.once:
result = queue.run_once()
log.info(
"scan complete: scanned=%d shipped=%d transient=%d conflicts=%d fatal=%d",
result.scanned, result.shipped, result.transient_failures,
result.conflicts, result.fatal,
)
# Exit code reflects fatal-only; transient failures aren't an error
# because the next pass / pod restart will retry.
return 1 if result.fatal else 0
# Daemon mode
stopping = False
def _stop(signum, frame): # noqa: ARG001
nonlocal stopping
log.info("received signal %s; finishing pass and exiting", signum)
stopping = True
signal.signal(signal.SIGTERM, _stop)
signal.signal(signal.SIGINT, _stop)
log.info(
"shipper starting: host_id=%s data_root=%s receiver=%s",
cfg.host_id, cfg.data_root, cfg.receiver.url,
)
queue.run_forever(stop_check=lambda: stopping)
return 0
if __name__ == "__main__":
sys.exit(main())