CIS490/tools
max 05bf785f0a fleet-health: exit 0 when alerts found (don't mark unit failed)
The detector previously returned 1 on alerts, which made systemd
mark cis490-fleet-health.service as 'failed' every tick that found
a sick host. That's the wrong UX — a detector finding a fault is
working correctly, not crashing. The alert is the signal (via
WARNING log + alerts.jsonl); the unit's success state should mean
"the detector itself ran cleanly." Test added.

Caught while live-deploying on the Pi: the first run found
elliott-thinkpad fatal-only at 943×4xx + 1425×5xx and correctly
emitted the alert — but systemd showed the unit red, which would
have caused operators to chase the wrong tail.

Side note: the same first run also caught a real bug — pycache for
receiver.store on /opt/cis490 was stale after I deployed the new
app.py + store.py from main, causing 1464 × 500 responses. Cleared
the pycache and the index immediately resumed growing (4465 →
4515 in 30 seconds). The detector earned its keep on the very
first cycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:51:20 -05:00
..
auto_fetch_samples.py auto_fetch_samples: pick Linux i386 ELF; manifest matches theZoo 2026-05-01 03:28:26 -05:00
build_cidata.py Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts 2026-04-30 00:02:27 -05:00
check_fleet_health.py fleet-health: exit 0 when alerts found (don't mark unit failed) 2026-05-02 13:51:20 -05:00
cis490_doctor.py shipper: systemd watchdog, quarantine cleanup; doctor surfaces ship errors 2026-05-01 12:02:59 -05:00
fetch_sample.py Close out the open issues: bridge pcap wiring, perf collector, Tier-4 2026-04-30 00:17:49 -05:00
index_backfill.py Receiver enforces X-Cis490-Code-Commit allow-list (live, auto-refreshed) 2026-05-01 01:38:50 -05:00
index_reader.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
load_mimic.py Synthetic envelope demo: phase-driven load mimic + plotter 2026-04-28 23:53:20 -06:00
plot_envelope.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
prune_episodes.py Multi-signal prune classifier: rescue valid episodes /proc misses 2026-04-30 19:10:01 -05:00
quarantine_unstamped.py fix: lab-host install loop after commit-gate cutover 2026-05-01 11:36:21 -05:00
run_campaign.py Add automated campaign runner, shipper, and systemd units 2026-04-30 14:53:40 -06:00
run_envelope_demo.py Synthetic envelope demo: phase-driven load mimic + plotter 2026-04-28 23:53:20 -06:00
run_fleet.py Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01) 2026-05-02 12:26:19 -06:00
run_real_vm_demo.py runners: take savevm baseline-v1 after boot so revert_at_* actually works 2026-04-30 02:37:05 -05:00
run_tier3_demo.py run_tier3_demo: replace serial probe with min-wait + TCP probe 2026-05-02 12:38:22 -06:00
ship_health_check.py fleet-health: proactive alerts on the Pi + per-host doctor reports 2026-05-02 13:48:31 -05:00
shipper.py Add automated campaign runner, shipper, and systemd units 2026-04-30 14:53:40 -06:00
show_envelope.sh Interactive envelope plot via WebAgg (browser-based) 2026-04-29 00:06:22 -06:00
verify_tier3_local.py tools/verify_tier3_local.py: Pi-runnable Tier-3 verifier 2026-05-01 03:41:21 -05:00
vm_load_controller.py Fix workload-silent false-positive on Alpine busybox guests (closes #15) 2026-04-30 17:28:48 -05:00
vm_serial.py Tier 2: real Alpine VM, real workload, real envelope 2026-04-29 08:38:53 -06:00