CIS490

History

max 49eba2fd60 fleet-health: proactive alerts on the Pi + per-host doctor reports Two pieces of self-monitoring so the maintainer isn't the alarm: (2) Receiver-side fleet health monitor cis490-fleet-health.timer runs check_fleet_health.py every 5 min. Detects three symptoms and writes them to /var/lib/cis490/alerts.jsonl + a syslog WARNING (greppable / easy to forward to a notifier): silent — host shipped in last 24h but has been quiet >30 min fatal-only — actively shipping but every PUT 4xx unstamped — shipping without X-Cis490-Code-Commit header Dedup is keyed on (host, symptom, hour-bucket) so a sustained fault fires once per hour, not every 5 min. 15 unit tests cover the index parser, three detectors, and dedup. (3) Per-host doctor snapshots Lab hosts run cis490-doctor-check.timer once a day (10 min after boot, then daily with 30-min jitter). The timer runs cis490_doctor.py --json and PUTs the result to a new endpoint: PUT /v1/host-health/<host> → /var/lib/cis490/host-health/<host>.json GET /v1/host-health → aggregate across all hosts Endpoint is NOT gated by version_gate — sick hosts running stale code MUST still be able to report sickness. 11 unit tests cover PUT/GET, atomic-write semantics, bearer auth, and the not-gated-by-version-gate property. ship_health_check.py reuses the existing shipper transport (mTLS + bearer + receiver URL from lab-host.toml) so we don't reimplement auth. Both timers wired into install-lab-host.sh — the loop also enables the previously-added autoupdate + cert-fetch timers, so a single install run gives a host all four self-healing mechanisms. Tests: 293 pass (26 new — 15 fleet-health, 11 host-health). 2 pre-existing test_fleet.py failures from the elliott-ThinkPad merge (`667f042`) are unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-02 13:48:31 -05:00
..
auto-update.sh	lab-host: cis490-autoupdate.timer for self-healing on push	2026-05-01 16:59:31 -05:00
fetch-alpine-baseline.sh	Close out the deployment-readiness gaps	2026-04-30 00:31:55 -05:00
fetch-lab-host-cert.sh	lab-host: cis490-cert-fetch.timer for automatic mTLS bootstrap retry	2026-05-02 13:30:16 -05:00
fetch-metasploitable2.sh	Tier 3 + Tier 4 auto-deploy: zero operator interaction	2026-04-30 23:12:08 -05:00
install-lab-host.sh	fleet-health: proactive alerts on the Pi + per-host doctor reports	2026-05-02 13:48:31 -05:00
install-msfrpcd.sh	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)	2026-05-02 12:26:19 -06:00
install-receiver.sh	bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9 , refs #3 )	2026-04-30 01:30:29 -05:00
install-tier-3-4.sh	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)	2026-05-02 12:26:19 -06:00
issue-cis490-client-cert-wrapper.sh	bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9 , refs #3 )	2026-04-30 01:30:29 -05:00