CIS490/tests
Max 1fabd4a246 training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers
The model layer of the project, built honestly:

  - tools/dataset_validate.py — full-sweep validator over the receiver
    store (sha256, schema, monotonic labels, telemetry-row gate). On the
    current corpus: 64,798 accepted + 8,154 degraded + 3,701 rejected +
    7 errored across 76,660 shipped episodes. data/processed/validation_v1.parquet
    is committed as the per-episode acceptance index.

  - training/_features.py — channel registry (46 channels across
    proc/guest/qmp/netflow), summary-stat windowing AND channel×time
    tensor extraction at 10s/5s windowing. Time alignment uses t_wall_ns
    (Unix ns) — tested fix for a real netflow-vs-host clock-base
    inconsistency that was silently dropping every netflow channel.

  - training/_split.py — three held-out recipes (host / sample / time)
    with profile-stratification assertions. held_out_host carries
    untested_profiles for cases like scan-and-dial absent from the test
    host (5 of 6 profiles tested cross-device, never silently averaged).

  - training/models/ — 6 architectures behind a common BaseModel
    interface: gbt (XGBoost), mlp, cnn, gru, lstm, transformer. Each
    trained twice (realistic / oracle) per the deployment threat model.
    Schema-hashed checkpoints refuse to load if _features.py changed
    since training (silent-input-drift protection, tested).

  - training/trainer/ — unified training loop: class-weighted CE, LR
    warmup + cosine, gradient clipping, mixed precision when CUDA,
    early stopping on val macro F1, best-on-val checkpoint. Same loop
    runs MLP/CNN/GRU/LSTM/Transformer; GBT uses XGBoost
    early_stopping_rounds on val mlogloss.

  - training/eval_/ — bootstrap 95% CIs on macro F1, per-class F1,
    per-profile and per-host breakdown, paired-bootstrap significance
    for model-vs-model gap. Confusion matrix uses union of seen labels.

  - training/dashboard/producers/ — replay/metrics/perf/profiles
    emitting the six event types the dashboard's awaiting scenes
    consume; on-demand tensor extraction so the Pi can run live
    inference without 65 GB of shards.

  - 17 unit tests (split coverage, features round-trip, schema mismatch,
    determinism, time-base alignment regression).

End-to-end smoke-trained all six on a 567-episode subset; held-out
test macro F1 reported with paired-bootstrap significance. The
methodology now reports honest cross-device generalization, not
in-distribution validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:19:00 -05:00
..
__init__.py Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency 2026-04-28 23:34:04 -06:00
test_auto_fetch_samples.py auto_fetch_samples: pick Linux i386 ELF; manifest matches theZoo 2026-05-01 03:28:26 -05:00
test_collectors_emit.py PIPELINE §5 step 5: collector admission emit tests (§4.4) 2026-05-04 01:37:40 -05:00
test_containment.py PIPELINE §5 step 3: target VM build infrastructure + containment posture 2026-05-04 01:31:40 -05:00
test_doctor_shipping.py shipper: systemd watchdog, quarantine cleanup; doctor surfaces ship errors 2026-05-01 12:02:59 -05:00
test_episode.py meta.json: stamp code_version (commit, branch, dirty) per episode 2026-05-01 01:29:01 -05:00
test_event_driven_labeller.py PIPELINE §5 step 6: event-driven labeller (§4.5) 2026-05-04 01:43:16 -05:00
test_exploits.py catalog: remove samba_usermap_script — never landed sessions in prod 2026-05-03 22:48:03 -05:00
test_fleet.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
test_fleet_health.py fleet-health: exit 0 when alerts found (don't mark unit failed) 2026-05-02 13:51:20 -05:00
test_guest_agent.py Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts 2026-04-30 00:02:27 -05:00
test_host_health.py fleet-health: proactive alerts on the Pi + per-host doctor reports 2026-05-02 13:48:31 -05:00
test_manifest.py PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml 2026-05-04 01:25:01 -05:00
test_pcap.py Collectors 2/4/5 + fleet runner + sample manifest + Tier-3 setup scripts 2026-04-30 00:02:27 -05:00
test_perf_qemu.py Close out the open issues: bridge pcap wiring, perf collector, Tier-4 2026-04-30 00:17:49 -05:00
test_proc_qemu.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
test_prune.py Multi-signal prune classifier: rescue valid episodes /proc misses 2026-04-30 19:10:01 -05:00
test_qmp.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
test_quarantine_unstamped.py fix: lab-host install loop after commit-gate cutover 2026-05-01 11:36:21 -05:00
test_receiver.py Add receiver: PUT /v1/episodes ingest with sha256 verify and idempotency 2026-04-28 23:34:04 -06:00
test_shipper.py shipper: systemd watchdog, quarantine cleanup; doctor surfaces ship errors 2026-05-01 12:02:59 -05:00
test_target_spec.py PIPELINE §5 step 3: target VM build infrastructure + containment posture 2026-05-04 01:31:40 -05:00
test_tier3_local_verify.py tools/verify_tier3_local.py: Pi-runnable Tier-3 verifier 2026-05-01 03:41:21 -05:00
test_tier4.py Close out the deployment-readiness gaps 2026-04-30 00:31:55 -05:00
test_training_checkpoint.py training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers 2026-05-08 01:19:00 -05:00
test_training_features.py training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers 2026-05-08 01:19:00 -05:00
test_training_split.py training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers 2026-05-08 01:19:00 -05:00
test_ulid.py Add v0 orchestrator + first oracle collector (host /proc) 2026-04-28 23:40:25 -06:00
test_verify_catalog.py PIPELINE §5 step 4: catalog admission verifier (§4.3) 2026-05-04 01:35:32 -05:00
test_version_gate.py robustness: gate falls back to local git, queue sweeps stale tarballs 2026-05-01 11:49:38 -05:00
test_vm_load_controller.py workload audit trail: meta.sample + per-phase events + pre-kill probe 2026-04-30 02:12:34 -05:00