shipper: crashes on startup when ca_bundle path does not exist — should wait/retry #11
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
First bring-up of elliott-thinkpad lab host via
install-lab-host.sh. Certs have not yet been provisioned by the Pi (requiresdeploy-cis490-cert.shon the Pi side). The shipper service starts immediately onsystemctl enable --nowbut crashes because the CA bundle configured inlab-host.tomldoes not exist on disk yet.What happened
cis490-shipperenters a fast crash/restart loop (every 5 s viaRestartSec=5) until certs land.What was tried
No workaround applied. The service will self-heal once
/etc/cis490/certs/wg-ca.pemis deployed by the Pi.Suggested next step
shipper/transport.py _build_ssl_context: ifcafileis configured but the path does not exist, log a clear warning and raise a retriable error (or better: sleep-and-retry in__main__) rather than crashing immediately on startup. This makes first-boot bring-up order-independent — shipper can be enabled before cert delivery.Related:
scripts/install-lab-host.shstep 7 already auto-fetches the cert viabootstrap.wgifhost_idis set; the race only happens ifhost_id = REPLACE_MEat install time (as it was here). Fixing that is a separate issue.Refs spectral/CIS490 Dev_REL1_043026 bring-up.
Fixed in
86a088c. Transport now pre-flights ca_bundle / client_cert / client_key paths and raises a recoverable _CertNotReadyError instead of letting ssl.create_default_context throw FileNotFoundError out of init. Each ping / ship_tarball retries the build, so the systemd unit no longer crash-loops during first-boot bring-up — it logs a warning once and self-heals when the cert lands.Regression test: tests/test_shipper.py::test_transport_defers_when_ca_bundle_missing.