Commit graph

160 commits

Author SHA1 Message Date
Max Gorog
4bf241f6ec code cards: presenter-friendly comments on every block
The four code snippets shown on stack and training-code scenes get
inline comments explaining the *why* of each line, not just *what*.
Aimed at the live audience: a presenter reads the comment as the
narration; a reader scans them top-to-bottom for the design story.

Covers: pyproject's three install profiles and what each library
contributes; receiver's bearer auth and why constant-time compare
matters; LSTM model's registry pattern, batch_first transpose,
last-step classification head; trainer loop's class weights vs the
imbalanced dataset, AMP scaler vs fp16 underflow, cosine + warmup
schedule, macro-F1 vs accuracy on imbalanced classes, best-state
restore vs last-epoch weights.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:17:31 -05:00
Max Gorog
da0e9ce83c code cards: mirror the actual training stack and trainer loop
The stack scene's pyproject snippet was missing the `training`
group (torch, sklearn, xgboost, zstandard) — the libraries that
do the actual model work. Updated to match the real pyproject.toml.

The receiver snippet now ends at _bearer_check(...) instead of the
import block alone — gives the slide a non-trivial line of code to
read.

The training-code scene replaces the toy "PhaseLSTM" hand-rolled
loop with the real LSTM model class (registry-decorated _SeqBase
subclass + _LSTMClassifier wrapping nn.LSTM with last-step
classification head) and adds a second card showing the actual
train_nn loop: AMP autocast/scaler, cosine LR with linear warmup,
inverse-frequency class weights, gradient clipping, macro-F1
on val, early stop with best-state restore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:15:01 -05:00
Max
c1c8e98180 scripts/train-pi-cpu-models.sh — sequential Pi-side trainer chain
Pi has 4 cores; only KNN and tree-based models are realistic to train
here without GPU. While Lambda runs the full 16-job manifest in
parallel (~1.7h), this chain trains the CPU-friendly subset on the
Pi (~30 min) so scenes 8 & 12 populate with multi-model numbers
within minutes instead of waiting on Lambda's full cycle.

Order: gbt-realistic, knn-realistic, knn-oracle, knn_semi-realistic,
knn_semi-oracle. Skips models whose .ckpt.json already exists
(idempotent restart). Each is a subprocess of training/trainer/run.py
so XGBoost/numpy/sklearn don't fight each other for cores.

Caller is expected to start gbt-oracle separately (it's the longest
single training and we kicked it off before invoking this script).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:12:34 -05:00
Max
05bccac29f producers: phase-aware attack envelopes + tickable KNN metric/perf
profiles.py — non-shortcut fit:

  Old: pick one accepted episode per profile, emit its raw
       fraction-of-duration curve. Confounded by single-episode noise,
       phase-budget timing variance, and the cumulative-counter
       startup-spike artifact.

  New: aggregate up to N=100 accepted episodes per profile, slice each
       by labels.jsonl phase events, resample EACH PHASE to a fixed
       budget so the median across episodes captures the canonical
       per-phase shape rather than smearing peaks across the timeline.
       Save median + p25/p75 band to data/processed/attack_profiles_v1.parquet.

  Per-phase point budget (sums to 80):
       clean_lead 10, armed 5, infecting 10, infected_running 40,
       clean_tail 15. dormant (when present) folded into infected_running.

  Channel swap: io-walk uses proc.cpu_sys_jiffies, NOT
  proc.io_write_bytes. Host /proc on QEMU doesn't see virtio-blk
  writes via io.write_bytes (writes go through KVM's I/O path, not
  write() syscalls); cpu_sys_jiffies tracks kernel time which spikes
  during heavy I/O scheduling.

  Concrete result: cpu-saturate now shows the proper plateau-during-
  infected_running with peak at 100 j/s (was 30 j/s spike at idx 0
  then mostly zero); low-and-slow shows its distinctive low-amplitude
  profile (peak 21 vs cpu-saturate's 100); io-walk shows the
  rapid-rise-then-decay shape consistent with dd finishing mid-phase.

knn.py — sticky model_metric / model_perf:

  Stream subcommand gains --also-metric / --also-perf-latency-us
  flags. When set, each cycle publishes a model_metric event
  (tagged model=knn) for scene-8 (model bars) and a model_perf
  event for scene-12 (accuracy vs inference cost). Republishing on
  the cycle keeps reconnecting browsers populated without depending
  on the dashboard's not-yet-built sticky-event cache.

  Measured KNN inference latency on the 150k-trained classifier:
      single-window predict: 61.5 ms (sklearn brute-force at 230 D)
      per-window in batch=64: 3.4 ms (the production-realistic number)

  Streamer published: model_metric{knn, 0.762} +
                      model_perf{knn, latency_us=3410, accuracy=0.762}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:08:03 -05:00
Max Gorog
3783fabe86 live scene: per-host swim lanes + latest-detection callout
New scene 13 (between perf and references) for fleet-wide live
predictions. Each host gets a row of recent prediction cells
(capped at 60), painted by predicted phase; mismatch with ground
truth shows a hatched overlay. A callout below the lanes holds
the most recent detection with model, profile, confidence, and
latency.

Producer contract is the new LiveDetection dataclass in events.py.
The dashboard side is producer-agnostic — the inference loop can
run locally or offload to A100 (or any GPU/host); just POST events
back. No rate-limiting needed; the swim-lane DOM does the capping.

Demo synthesizes 5 hosts walking through phases at ~92% accuracy
so the scene reads as live the moment the deck loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:03:32 -05:00
Max
9d56bcc923 docs: request to dashboard side — persist KNN embeddings on refresh
Producer-side knn fit is saved at data/processed/knn_v1.parquet
(150k rows, 3.4 MB). Live streamer publishes 2000-point cycles every
~2 s, but per PRODUCERS.md §reconnect-gotcha live events aren't
replayed; refresh-to-data is currently bounded by cycle time.

Three options laid out for the dashboard chat to pick:
  A. Sticky cache (per-event-type ring buffer in the broadcaster)
  B. Feeder reading the parquet → broadcaster.state["embedding_cache"]
  C. Caddy fileserver + JS fetch on load

Whichever option lands, the producer side will adapt (e.g., dump a
JSON sidecar if Option C is picked). Path ownership preserved —
dashboard owns dashboard/, producer owns producers/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:54:38 -05:00
Max
2aa7b865fb training/models: knn_semi — semi-supervised self-training KNN
Registered as `knn_semi`. Answers the research question:

  *If we had ground-truth labels for only a fraction of training
   episodes, could we use the structure of the unlabeled rest to
   recover most of supervised KNN's accuracy?*

Pipeline (Yarowsky-style self-training):

  1. Split train slice deterministically into labeled (label_frac=0.2
     default) and unlabeled (1 - label_frac) by row-index hash.
  2. Fit a "labeler" KNN on the labeled fraction.
  3. Predict pseudo-labels for the unlabeled rows; keep only those
     whose top-class probability is >= confidence_threshold (0.6).
  4. Fit the final KNN on (labeled rows + confident pseudo-labels).
     Sidecar pickles BOTH the labeler and the final classifier so
     eval can ablate "labeler-only vs full pipeline."

Smoke run (567-episode subset, oracle mode, label_frac=0.2):

                       val_macro_f1   test_macro_f1
  knn       (100% labels)   0.737        0.133
  knn_semi  (20% labels)    0.654        0.173

Lower val (less data) but HIGHER cross-device test — pseudo-labeling
acts as a regularizer that prevents overfitting to elliott-thinkpad's
specific neighborhood structure. Honest research finding worth a slide
in the writeup.

Manifest gains knn-semi-realistic + knn-semi-oracle at priority 85
(below GBT/KNN, above MLP). Storage cost = augmented set × n_features
× 4 bytes; same .knn.pkl sidecar format as plain KNN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:51:30 -05:00
Max
e46906b68c training/producers/knn: supervised LDA / UMAP projector + batched publish
Two changes that make scene-11 actually look like a clustering scene:

1. Supervised projection (--projector lda | umap | pca)
   - PCA was variance-greedy and oblivious to phase labels — clumped
     classes together because the dominant variance directions weren't
     class-discriminative.
   - LDA (default): Fisher Linear Discriminant. Linear, fast (~seconds),
     reproducible. On 150k windows: between-class variance 0.462 / 0.331
     / 0.167 across the three axes (96% of class-discriminative info
     in the first 3 dims).
   - UMAP (--projector umap): supervised nonlinear manifold embedding;
     tighter visual clusters at the cost of ~10 minutes for 150k on a
     Pi-class CPU. Reproducible via random_state. Subsamples to 20k for
     fit then transforms remaining points.
   - PCA still available for reference / debugging.

2. Batched concurrent publish (--burst-size N)
   - Sequential publish was ~6.5 ms/event over loopback HTTP → 13 s
     per 2000-point cycle.
   - asyncio.gather with burst_size=50 turns each batch into ~5 ms,
     so the same cycle is ~0.5 s. Browsers see the scatter populate
     in well under a second instead of waiting through a 13 s cycle
     per refresh.
   - Default burst_size=50 is conservative — the dashboard's WebSocket
     fan-out can take more pressure but 50 leaves headroom.

Saved fit format unchanged (data/processed/knn_v1.parquet); the
streamer's --load-fit reads the same parquet regardless of which
projector produced it. The LDA / UMAP choice is captured in the
producer's log + saved parquet metadata, not in the file shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:45:16 -05:00
Max Gorog
2abc55a59b knn scatter: auto-fit projection to running data spread
Project around mean ± k·σ instead of the raw [0,1]³ producer-unit
cube. PCA-3 outputs are Gaussian-ish so even after the producer's
min/max rescale, the bulk of points clusters near the centroid;
without auto-fit the scatter looks dead-centre and tiny.

Implementation: incremental Welford-ish stats (running sum / sum²)
per axis, recomputed lazily on the first frame after new data
arrives. project() centers and σ-scales each point to ~[-0.5, 0.5];
outliers clamp to ±0.7 so they're visible just outside the cube.
The bounding cube now traces mean ± k·σ instead of [0,1]³, which is
also the natural visual unit for the "data spread" the user reads
off the screen.

resetStats() runs on demo toggle and is implicit when points are
cleared. SPREAD_K=2.5 puts ~99% of normally-distributed data inside
the cube; MIN_STD=0.02 keeps degenerate (all-equal) data from
exploding the divisor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:33:19 -05:00
Max
aa6187042b .gitignore: exclude data/processed/knn_*.parquet
KNN fit output (PCA-3 + KMeans + KNN-classifier predictions per
window) is a derived artifact regenerable from features_window_v1.
Like features_window itself it stays out of git; the streamer
reads it from disk on the producing host.
2026-05-08 13:20:17 -05:00
Max Gorog
f537ab8686 models scene: paint the knn bar (CSS color + demo entry)
The model-bar widget rendered .model-fill.knn with no gradient when
a model_metric{model:"knn"} arrived, leaving an empty track. Add a
green gradient and include knn in the demo-mode set so the row is
visible without waiting on the producer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:16:38 -05:00
Max
ba5ff70c14 training/producers/knn: add stream subcommand for disk-loaded loop
The fit pipeline (PCA-3 + KMeans + KNN classifier) can be expensive
to recompute every time a producer starts. `produce --fit-out` already
dumps the per-window (x, y, z, phase_int, predicted_int, cluster) to a
parquet; this commit adds a `stream` subcommand that loads that
parquet and publishes Embedding events on a loop.

Why a separate streamer:
  - The dashboard's live event stream is not replayed on browser
    reconnect (PRODUCERS.md §reconnect-gotcha). A browser that
    connects 30 s after the last cycle of the producer sees an empty
    scatter unless we re-publish.
  - The fit is deterministic given (features, seed) — no need to
    repeat it just to re-publish points. The streamer is small and
    stateless; it can run as a long-lived service.

Usage:
  python -m training.producers.knn produce \\
      --window data/processed/features_window_v1.parquet \\
      --schema data/processed/feature_schema_v1.json \\
      --fit-out data/processed/knn_v1.parquet \\
      --no-publish

  python -m training.producers.knn stream \\
      --load-fit data/processed/knn_v1.parquet \\
      --loop --max-points 2000

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:13:09 -05:00
Max Gorog
97eb34f7f6 baseline prose: reflect the dataset-derived phase mix
The widget no longer rolls the last 5 minutes; it aggregates
time-weighted phase durations across a sampled slice of the
on-disk dataset. The prose now matches the bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:07:36 -05:00
Max
2187a5d752 training/models: KNN as a registered supervised model
Non-parametric baseline alongside GBT/MLP/CNN/GRU/LSTM/Transformer.
Same BaseModel + schema-hashed checkpoint contract; sidecar is a
pickled sklearn KNeighborsClassifier (.knn.pkl) handled by the
existing checkpoint machinery alongside .xgb.json / .pt.

KNN's storage cost = n_train_rows × n_kept_features × 4 bytes.
At 660k windows × 145 kept (realistic mode) features = ~380 MB
sidecar; at 230 features (oracle) = ~600 MB. Heavy but ships through
the same artifact-upload path.

trainer/run.py learns a third fit branch:
  - GBT — XGBoost early stopping on val mlogloss
  - KNN — fit() memorizes; "training time" is val/test predict cost
  - NN  — train_nn loop (the rest)

Manifest gains knn-realistic + knn-oracle at priority 95 (just
below GBT). KNN's k=10 default lives in the model class — overriding
via hyper.k requires adding --k to run.py first to avoid the
unknown-arg exit-2 issue.

Smoke verified on the 567-episode subset:
  knn   oracle    val=0.7365  test=0.1333  (held-out k-gamingcom)

That val/test gap (0.74 → 0.13) is the cross-device generalization
story: KNN memorizes elliott-thinkpad's local feature space and
falls apart on the other host. Honest baseline for the comparison
report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:06:56 -05:00
Max Gorog
51f2437b71 baseline: phase mix from sampled dataset, not 5-min window
The widget was waiting on live `phase` events that don't flow when no
orchestrator is running, so it sat empty. Replace the rolling
5-minute window with a periodic feeder that samples 500 random
episode tarballs from /var/lib/cis490/episodes, extracts each
labels.jsonl, and aggregates phase durations using consecutive
t_mono_ns deltas. Result lands in broadcaster.state["phase_mix"]
(survives snapshot cycles via dict.update) and re-broadcasts every
~10 min.

Frontend reads phase_mix from snapshot on connect and from live
phase_mix events on refresh; the bar uses time-weighted proportions
when available (falls back to label counts), and only sums canonical
phases for the denominator so non-displayed `failed` records don't
shrink the visible bars. Eyebrow and sub-line update with live
sample/population/label counts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:04:36 -05:00
Max
ac9b5b6f07 training/producers: knn producer for scene-11 + ModelMetric{knn}
KNN-driven embedding events for the dashboard's KNN scatter scene
(scene 11). One forward pass populates all three of the scatter's
mode-toggle fields:

  x, y, z    — PCA-3 projection of the standardized window features
  phase      — ground-truth phase from labels.jsonl
  predicted  — KNN classifier's prediction (k=10, distance-weighted)
  cluster    — MiniBatchKMeans cluster id (k=8 default)

Two subcommands:

  python -m training.producers.knn produce  ...  emit Embedding events
  python -m training.producers.knn metric    ...  publish ModelMetric{knn}
                                                  on a tick (re-publish
                                                  for reconnect-warmth)

KNN classifier uses the held-out-by-host split aligned with the
supervised pipeline (train ∪ val on elliott-thinkpad, predict on
k-gamingcom) so the predictions reflect cross-device generalization,
not in-distribution self-prediction.

Smoke-verified end-to-end against the live dashboard (3 clients):
800 embedding events delivered in 12 s; ModelMetric{knn} with
test_macro_f1 = 0.4297 on the 567-episode smoke subset, sitting
between the trained GBT (0.557) and the under-trained NN models
(0.09–0.18) — sensible for a non-parametric baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:03:19 -05:00
Max Gorog
12ac409ab2 knn scene: drag-to-rotate 3-D scatter + KNN/cluster color modes
Replace the SVG 2-D scatter with a canvas-based 3-D one. Three color
modes (phase / predicted / cluster) with a toggle; drag the surface
to rotate; reset button. Bounding cube draws faintly so the rotation
reads as 3-D rather than re-shuffled 2-D.

Embedding event gains optional z / predicted / cluster fields. 2-D
producers still work (z defaults to 0.5, no other behavior changes).

CSS adds .scatter3d-* rules; --theme-h-num exposed for cluster-color
hue arithmetic. Synthetic demo data is now 3-D Gaussian clusters with
~7% mislabeled "predictions" so the predicted-mode view differs from
ground truth at a glance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:55:31 -05:00
Max Gorog
9e38f78379 training/dashboard(references): description sidebar + better space use
Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
2026-05-08 12:40:32 -05:00
Max
69c563275a training: parallelize lambda bootstrap (2 jobs at a time on the A100)
At our model sizes (max ~250 K params, max batch 512), each training
process uses ~1 GiB VRAM. A 40 GiB A100 is far from contention with
two concurrent jobs. Bounded-concurrency rolling launcher cuts
sequential ~3.5 h → parallel ~1.7 h for the full 14-job manifest.

  PARALLEL=2 (default) — override via env var if running on a smaller GPU
  or testing the queue logic.

Per-job logs still land at logs/<model>_<mode>.log; failure reporting
is the same. Idempotent: skipping already-present checkpoints unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:37:03 -05:00
Max Gorog
bee40a6ae9 training/dashboard: references scene with PDF viewer + tab strip
New scene 13 (after perf, the last in the deck) renders a tabbed
PDF viewer. Each tab is one .pdf in /opt/cis490/references/; the
active tab swaps the iframe's src to /refs/<encoded-filename>.

Backend
- /api/references — lists pdfs in REFS_DIR, returning
  {"name": stem (newlines stripped), "path": "/refs/<urlencoded>"}.
- /refs static mount — serves the PDFs directly. check_dir=False
  so the dashboard still boots if the directory is missing.
- REFS_DIR resolves relative to the install root so it works on
  /opt/cis490 in production and any dev tree.

Frontend
- Stage view uses metric-stack-wide for the broader card; the
  references scene also overrides .stage-view padding-right down
  to a small gutter so the iframe takes most of the screen
  horizontally — the prose card still sits on the right but the
  PDF area is roughly 70% wide on standard viewports.
- Tabs are styled like .db-tab (palette-aware pills) and stop
  propagation so they don't trigger the click-to-advance gesture.
- Iframe is lazy-loaded: src isn't set until the user actually
  scrolls into the references scene OR clicks a tab, so the
  browser doesn't fetch a big PDF the user may never view.
2026-05-08 12:34:52 -05:00
Max
308140c6ce training: lambda-cloud one-shot training integration
External-GPU path for the time-pressured first round, before the
Windows desktop joins the WG fleet. Lambda is treated as an "external
worker" whose output lands in the same /var/lib/cis490/models/ tree
the receiver-coordinated fleet uses, so cis490-jobs status reflects
Lambda runs identically to fleet runs.

Three scripts + one ingest tool:

  scripts/build-lambda-bundle.sh
    Tarball at /tmp/cis490-lambda/lambda-bundle-<short>.tar.zst with:
      - the repo (sans .git, sans data/, sans artifacts*)
      - data/processed/{validation_v1,features_window_v1}.parquet
      - data/processed/feature_schema_v1.json
      - data/processed/tensor_window_v1/   (npz shards)
      - bootstrap.sh (entrypoint)
      - training_manifest.toml (the canonical job list)
      - BUNDLE_MANIFEST.json (commit hash + counts + build stamp)
    Verifies all four data inputs exist BEFORE compressing 5+ GB.

  scripts/run-on-lambda.sh ubuntu@<ip>
    rsync bundle up → ssh + run bootstrap → rsync artifacts +
    reports/eval back to artifacts-lambda/ + reports/lambda/.
    Resumable rsync; sha256-verified.

  scripts/lambda-bootstrap.sh   (runs ON the Lambda instance)
    Creates .venv with cu121 torch + xgboost + the [training] deps,
    iterates the manifest's job list in priority order (highest first),
    runs trainer/run.py (or run_ssl.py for transformer_ssl) per job,
    skips jobs whose .ckpt.json already exists (idempotent on re-run),
    writes per-job logs/<model>_<mode>.log, runs eval suite at the end,
    stamps artifacts/RUN_SUMMARY.json with counts + failed-job list.

  tools/ingest_lambda_artifacts.py
    Bundles each (ckpt.json + sidecar + train.json) trio into a
    .tar.zst, sha256, PUTs to the local trainer-receiver's
    /v1/model/{job_id}, marks the job complete. Maps (model, mode) →
    job_id by re-reading the canonical manifest. Handles the queue
    state churn (requeue if completed, claim if pending, fail-back
    on race losses).

End-to-end smoke verified on the A100 instance just provisioned:
  - SSH from Pi via ed25519 keypair (cis490-trainer-pi)
  - GPU: A100-SXM4-40GB, driver 580.105.08
  - venv warmed: torch 2.5.1+cu121, xgboost 3.2.0
  - 464 GB ephemeral disk available

Pi-side feature build (build_features.py + build_tensors.py against
all 72,952 accepted+degraded episodes) is in progress; bundle build
gates on its completion. Estimated wall-clock for the full Lambda
training run on A100: ~2.5 hours for 12 supervised + 2 SSL models +
eval suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:32:04 -05:00
Max
697e36a315 training/producers: move out of dashboard/ per ownership boundary
Producers are event *sources* — the renderer is everything inside
training/dashboard/. Sibling layout makes the dependency direction
one-way (producers import from training.dashboard.events; dashboard
never reaches into producers).

  training/dashboard/producers/   →   training/producers/

Internal imports rewritten via sed; eval_/run.py and training/README.md
cross-references updated. CLI entry stays via `python -m training.producers.<sub>`
(replay / metrics / perf / profiles).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:06:56 -05:00
Max Gorog
f303337a1e training/dashboard: events.py — typed producer interface
Single import point for the model session to wire interactive
scenes. One @dataclass per event type, with docstrings naming the
scene each one drives and the shape of every field:

    PhaseEvent       — scene 6  (baseline phase mix)
    AttackProfile    — scene 7  (per-profile envelope thumbnails)
    Prediction       — scene 8  (10-second window timeline)
    ModelMetric      — scene 9  (model accuracy bars)
    Embedding        — scene 11 (KNN scatter)
    ModelPerf        — scene 12 (accuracy-vs-latency scatter)

Phase + Model Literal types narrow the inputs so static checkers
+ IDEs autocomplete the canonical strings.

Publisher.publish now accepts either a dataclass instance from
events.py or a plain dict, so the existing
``pub.publish({"type": "...", ...})`` pattern keeps working
untouched.

Module-level publish() / try_publish() helpers wrap a default
Publisher for one-liner usage. The PRODUCERS.md guide now leads
with a pointer to events.py so the typed interface is the first
thing producers read.
2026-05-08 11:59:03 -05:00
Max Gorog
058f2d75a9 training/dashboard(theme): code-card syntax colors follow theme H
Syntax-highlighting tokens (kw/str/com/fn/ty/num) were hardcoded
GitHub-dark hex values that ignored the theme. Each token now uses
oklch(75% 0.18 H+offset) where the offset is fixed per token type
and H is var(--theme-h). Result: turning the H slider rotates all
six syntax colors together while preserving their relative angular
separation, so they stay distinguishable from each other regardless
of theme. Comment color goes through --fg-mute to pick up the
L-step grey transitions like other body text.
2026-05-08 01:48:52 -05:00
Max Gorog
0a3feaae68 training/dashboard(theme): align wheel conic start to 12 o'clock to match marker math
Conic gradient was 'from -90deg' which puts the gradient origin at
9 o'clock; the JS positionMarker uses (H - 90)° so H=0 lands at 12
o'clock. The two were offset by exactly 90°, which is why the color
under the marker on the wheel didn't match the marker's own swatch.
2026-05-08 01:38:46 -05:00
Max Gorog
153860f1db training/dashboard: theme-aware text greys with discontinuous L step + fix hardcoded colors
Two changes:

1. Replaced the linear L→grey-L mapping with a STEP function at
   theme-L = 50%. clamp(0, (L - 50) × 1000, 1) gives 0 below 50%
   and 1 above 50% with a vanishingly small transition zone —
   effectively a step in pure CSS. Both landing values are inside
   each grey's safe-contrast band, so text stays readable at every
   slider position, with a clear visible "click" as the slider
   crosses 50%. The chroma tint stays linear (it doesn't threaten
   contrast).

2. Fixed text that wasn't responding to the theme because it had
   hardcoded color values:
   - .intro-title gradient (#fff → #8b949e) → var(--fg) → var(--fg-dim)
   - .chunk-cell text (rgba(255,255,255,0.85)) → var(--fg)
   - .scene .prose strong (#fff) → var(--fg)

JS now publishes --theme-l-num (unitless) alongside --theme-l (with
%), since calc() can't multiply a percentage by a unitless number
to produce a unitless step value.
2026-05-08 01:37:26 -05:00
Max Gorog
058970de76 training/dashboard: text & line greys driven by theme L/C/H
The static fg/fg-dim/fg-mute/line/line-soft hex values are now
oklch() expressions that read the theme's L, C, and H. Each keeps
its prior base lightness (93/63/38/18/22%, matching the previous
greys at default settings) plus a small bias from how far the L
slider has moved from 70% — so the text greys track the theme's
brightness without ever leaving the readable contrast range — and
a fractional chroma so they pick up a hint of the theme hue when
the C slider is cranked up.

Falls back to the same numerics in static form when --theme-l
etc. aren't yet set (briefly during initial page load before JS
runs).
2026-05-08 01:30:19 -05:00
Max
8643192a71 training/fleet: distributed multi-host trainer with capability gating
Symmetric companion to the collection fleet (orchestrator/fleet.py)
but for *training*. Collection is embarrassingly parallel; training
is not (a model is trained at most once across the fleet), so the
receiver coordinates which worker gets which job.

Operator-control surface is etc/training_manifest.toml.example —
single canonical file declaring (a) per-host capability + per-model
allow/deny policy, (b) one [[jobs]] entry per (model, mode, hyper)
with capability constraints (require_cuda, prefer_cuda, min_vram_gib,
min_ram_gib, allowed_hosts).

Components:

  capability.py — self-detection: hostname, cores, RAM, CUDA presence,
    VRAM, torch version, git commit. Used by workers to filter
    eligible jobs before claiming.

  manifest.py — TOML loader + JobSpec/HostSpec. Job IDs are stable
    sha256 of (model, mode, hyper, split_recipe, train_hosts, seed)
    so manifest reload is idempotent: existing rows keep their status,
    new jobs become claimable, removed jobs stay until cancelled.

  queue.py — SQLite job queue (training_jobs.db) with statuses
    pending|claimed|running|completed|failed|cancelled. Atomic
    claim_next via single UPDATE WHERE status='pending'. Heartbeat,
    complete, fail. Stale-claim sweep (stale_after_s=600s) with
    max_attempts cutoff to failed.

  store.py — model artifact store mirroring receiver/store.py.
    Artifact ID is the sha256 of the uploaded tarball; bit-identical
    re-runs deduplicate.

  receiver.py — Starlette app exposing 11 endpoints:
    POST /v1/job/claim          (worker)
    POST /v1/job/{id}/heartbeat (worker)
    POST /v1/job/{id}/complete  (worker)
    POST /v1/job/{id}/fail      (worker)
    PUT  /v1/model/{id}         (worker — uploads tarball)
    GET  /v1/jobs               (anyone)
    GET  /v1/workers            (anyone)
    POST /v1/job/{id}/cancel    (operator: X-Operator-Token)
    POST /v1/job/{id}/requeue   (operator)
    POST /v1/manifest/reload    (operator)
    GET  /v1/health             (anyone)
    Runs as cis490-trainer-receiver.service on the Pi alongside the
    existing receiver, on a separate port.

  client.py — stdlib HTTP client (urllib only, no new deps).

  worker.py — long-running daemon. Loop: detect capability → claim →
    spawn training/trainer/run.py subprocess → heartbeat every 30s →
    tar artifact, sha256, PUT /v1/model → complete. SIGTERM-safe.

Operator CLI (tools/cis490_jobs.py): status / list / show / cancel /
requeue / reload / workers. Cancel and requeue require
$CIS490_OPERATOR_TOKEN matching the receiver's configured value.

Bootstrap: scripts/install-training-worker.sh (Linux systemd) and
scripts/install-training-worker-windows.ps1 (Windows Scheduled Task)
let the operator enroll a new host with one command after cloning
the repo and setting up the venv. Worker self-tests capability
before registering.

End-to-end smoke verified on the Pi: receiver up, manifest synced,
14 jobs queued, worker registered, claimed 4 CPU-eligible jobs
(allow_jobs=["gbt","mlp"]), completed 3 (gbt-realistic, gbt-oracle,
mlp-oracle), 1 failed with the actual error visible via
cis490-jobs status, 3 artifacts uploaded to
/var/lib/cis490/models/<model>_<mode>/<sha256>/bundle.tar.zst with
proper index.jsonl row.

21 unit tests (manifest validation: 8; queue lifecycle + eligibility:
13). All pass alongside the prior 17 training tests = 38 green.

Open limitations surfaced inline:
  - Hyper-key drift between manifest and run.py fails at training
    time, not at manifest reload (worth tightening to argparse
    introspection later).
  - mTLS not yet wired through Caddy for the trainer-receiver port —
    listens loopback-only until that lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:20:20 -05:00
Max
3ea6bca6f0 training: self-supervised pretrain + IG XAI + project brief / slide planner
LogBERT-style self-supervised Transformer pretrain on `clean`-only
windows, plus Integrated Gradients attribution for any tensor model.
Both directly answer the assignment's §8 'next steps in unsupervised
learning' requirement and Natsos & Symeonidis 2025's RQ3 on
explainability.

Pretrain (training/models/transformer_ssl.py +
trainer/run_ssl.py):
  - Masked Timestep Reconstruction (MTR) — random 15% of timesteps
    zeroed, encoder + per-channel head reconstructs from the rest.
    Loss: MSE over masked positions.
  - Volume of Hypersphere Minimization (VHM, Deep SVDD-style) — pull
    learnable [DIST] token embedding toward a frozen center vector
    initialized as the mean over clean train. Loss: ||h_dist - c||^2.
  - Calibrated anomaly threshold at user-configurable target FPR
    (default 5%) on clean-val distance distribution.
  - Trained ONLY on `clean`-phase windows; the model never sees a
    labeled malware sample yet flags any window that doesn't look
    clean — including novel malware the supervised classifier never
    saw. Uses the same schema-hashed checkpoint format as the
    supervised models so loaders refuse mismatched feature schemas.

XAI (training/xai/integrated_gradients.py):
  - Per-(channel, timestep) attribution via path-integrated gradients
    over Riemann-mid-point steps. Works for cnn/gru/lstm/transformer/
    transformer_ssl.
  - Per-phase mean |IG| heatmaps under reports/xai/<model>/<phase>.png,
    top-k channel importance per phase as JSON. Smoke-verified on the
    trained CNN: top channel for `clean` is guest.cpu_iowait (sensible
    — clean = idle = high iowait).

Project brief and slide planner:
  - docs/project_brief.md — full draft of the assignment's required
    sections 1–9 (problem, research question, ML task type with
    justification, six supervised algorithms with assumptions, dataset
    description with full validation breakdown, evaluation metrics with
    rationale, current progress, lit review with 11 APA citations,
    next steps for unsupervised, references).
  - docs/slide_planner.md — all 16 slides filled with content tied to
    specific files and metrics from this codebase, not generic
    placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:19:41 -05:00
Max
1fabd4a246 training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers
The model layer of the project, built honestly:

  - tools/dataset_validate.py — full-sweep validator over the receiver
    store (sha256, schema, monotonic labels, telemetry-row gate). On the
    current corpus: 64,798 accepted + 8,154 degraded + 3,701 rejected +
    7 errored across 76,660 shipped episodes. data/processed/validation_v1.parquet
    is committed as the per-episode acceptance index.

  - training/_features.py — channel registry (46 channels across
    proc/guest/qmp/netflow), summary-stat windowing AND channel×time
    tensor extraction at 10s/5s windowing. Time alignment uses t_wall_ns
    (Unix ns) — tested fix for a real netflow-vs-host clock-base
    inconsistency that was silently dropping every netflow channel.

  - training/_split.py — three held-out recipes (host / sample / time)
    with profile-stratification assertions. held_out_host carries
    untested_profiles for cases like scan-and-dial absent from the test
    host (5 of 6 profiles tested cross-device, never silently averaged).

  - training/models/ — 6 architectures behind a common BaseModel
    interface: gbt (XGBoost), mlp, cnn, gru, lstm, transformer. Each
    trained twice (realistic / oracle) per the deployment threat model.
    Schema-hashed checkpoints refuse to load if _features.py changed
    since training (silent-input-drift protection, tested).

  - training/trainer/ — unified training loop: class-weighted CE, LR
    warmup + cosine, gradient clipping, mixed precision when CUDA,
    early stopping on val macro F1, best-on-val checkpoint. Same loop
    runs MLP/CNN/GRU/LSTM/Transformer; GBT uses XGBoost
    early_stopping_rounds on val mlogloss.

  - training/eval_/ — bootstrap 95% CIs on macro F1, per-class F1,
    per-profile and per-host breakdown, paired-bootstrap significance
    for model-vs-model gap. Confusion matrix uses union of seen labels.

  - training/dashboard/producers/ — replay/metrics/perf/profiles
    emitting the six event types the dashboard's awaiting scenes
    consume; on-demand tensor extraction so the Pi can run live
    inference without 65 GB of shards.

  - 17 unit tests (split coverage, features round-trip, schema mismatch,
    determinism, time-base alignment regression).

End-to-end smoke-trained all six on a 567-episode subset; held-out
test macro F1 reported with paired-bootstrap significance. The
methodology now reports honest cross-device generalization, not
in-distribution validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:19:00 -05:00
Max Gorog
a04bba6281 training/dashboard: click a db row → render the episode envelope
New endpoint GET /api/episode/<host_id>/<episode_id> in app.py.
Stream-decompresses the tarball (zstd -dc piped into tarfile),
extracts telemetry-proc.jsonl, labels.jsonl, and meta.json,
returns the parsed contents. Synchronous extract runs in
asyncio.to_thread so the event loop isn't blocked.

Frontend: clicking a row in the database explorer now fetches
the episode and draws an SVG chart matching the README's Real
Alpine VM envelope shape:
  - per-interval CPU jiffies delta (user + sys)
  - per-interval IO bytes delta (read + write)
  - colored phase bands (clean/armed/infecting/infected_running/
    dormant) overlaid by labels.jsonl
  - axis ticks for 0-peak on Y, 0-totalDuration in seconds on X
  - legend below the chart with palette-driven swatches

The detail panel that previously showed the row JSON now shows
metadata + the chart + the legend. Validated end-to-end against
a real episode (863 samples, 8 labels) extracted from
/var/lib/cis490/episodes/elliott-thinkpad/.
2026-05-08 01:16:54 -05:00
Max Gorog
2aa33d19c1 training/dashboard: reduce metric-stack left padding to shift interactables left 2026-05-08 01:02:27 -05:00
Max Gorog
1160244dfa training/dashboard: tighten stage-view padding-right to full prose-w (no overlap) 2026-05-08 01:00:46 -05:00
Max Gorog
0175882ed6 training/dashboard: shift prose right, tighten metric-stack reserved width
Prose's feathered left edge was overlapping with interactive
widgets in the metric stack and blocking clicks (the prose has
pointer-events: auto so its left half — even when visually
mostly transparent — still captured input).

Two changes:
- .scene right padding: 2rem → 0.5rem. Pushes the prose card
  ~1.5rem further right.
- .stage-view padding-right: was calc(prose-w - 1.5em), now just
  prose-w. The metric-stack now ends exactly where the prose
  column starts instead of being pushed 1.5em into prose
  territory.

Result: roughly 3em of overlap reduction. The prose's left
feather and the metric-stack's right feather now meet over bg
rather than over each other's content.
2026-05-08 01:00:28 -05:00
Max Gorog
698a3c96bc training/dashboard: bilateral fade on the metric-stack backdrop
Backdrop card was a sharp rectangle whose right edge butted hard
against the prose column's left feather, producing a visible
seam where the two layers met. Replaced the solid background
with a horizontal linear-gradient that fades at both edges:

  0%   → transparent       (left edge dissolves into bg)
  8%   → full backdrop     (card body begins)
  78%  → full backdrop     (card body ends)
  100% → transparent       (right edge dissolves into bg)

The right fade is wider than the left because the right edge
overlaps the prose column's feathered start; double-feathering
that interface gives a continuous metric-card → bg → prose-card
transition with no rectangles meeting.

Border-radius removed — was hidden by the feather anyway.
2026-05-08 00:30:49 -05:00
Max Gorog
b41bd75209 training/dashboard(theme): add the content-backdrop slider markup that the JS expects 2026-05-08 00:25:12 -05:00
Max Gorog
6d3f8f1ef8 training/dashboard(theme): fadeable content backdrop behind prose & metrics
New 'content backdrop' slider (0..1, default 0.30) in the
animation section of the theme panel. Drives a single CSS
variable --content-backdrop that controls a uniform dark layer
behind both:

- .metric-stack — solid background with that opacity, plus a
  rounded corner so the metric content reads as a card sitting
  over the bg.
- .scene .prose — added as a SECOND background layer underneath
  the existing left-feathering gradient. The gradient stays;
  where it's transparent (left edge), the new uniform layer
  shows through. At backdrop=0 the prose looks identical to
  before; at backdrop>0 the feathered edge reveals a partly-
  opaque dark instead of fully transparent bg.

So when the bg is busy (vaporwave, drift, lava) the user can
crank backdrop up for legibility; when it's the still black
theme they can drop it to 0 for the cleanest look.
2026-05-08 00:24:55 -05:00
Max Gorog
fd5a0fba09 training/dashboard(vaporwave): re-enable scanlines with all safety measures 2026-05-08 00:20:26 -05:00
Max Gorog
91a3aceb68 training/dashboard: stage-view opacity-transition removal is the fix
Confirmed by user that snapping scenes in/out instead of opacity-
transitioning fixed the grid-shape artifact that had been appearing
over metric content during scene changes.

Root cause: while stage-view's opacity animated between 0 and 1
(over 600ms), the compositor was rendering stage-view to its own
intermediate bitmap and sampling whatever was painted underneath
— including the bg-canvas's animated perspective grid. That
sampled grid leaked into the metric content area for the duration
of the transition. Removing the transition removes the compositor
work entirely; scenes change with a snap, no resampling.

Trade-off accepted: no fade between scenes. If a smoother
transition is wanted later, options that DON'T trigger the same
sampling are clip-path wipes, transform-based slides, or animating
opacity at <100ms (short enough that the sampled bitmap doesn't
have time to register visually).
2026-05-08 00:19:59 -05:00
Max Gorog
1fd2adf376 training/dashboard: diagnostic — remove stage-view opacity transition 2026-05-08 00:18:48 -05:00
Max Gorog
d99a8861f3 training/dashboard: diagnostic — hide intro .bg-grid unconditionally to test source 2026-05-08 00:16:49 -05:00
Max Gorog
09960812fa training/dashboard: will-change: opacity on stage-view to pre-promote layer
The grid-shaped artifact that appeared over metric content
during scene transitions was Chromium promoting the stage-view
to its own compositor layer mid-transition (when opacity left
exactly 0). At that promotion moment the new layer samples
whatever's painted underneath as its initial bitmap — which is
the moving perspective grid in the bg — and that snapshot stays
visible for the duration of the 600ms opacity transition,
reading as a phantom grid pattern over the metric content.

will-change: opacity tells the browser to promote the layer
before the transition starts. The transition is then a pure
compositor opacity interpolation: no resampling of bg, no stale
snapshots. The hint is on the actual transitioning element
(stage-view), not on canvas-wrapper, which avoids the
cutout-mask issue from the previous over-aggressive layer
isolation attempts.
2026-05-08 00:14:57 -05:00
Max Gorog
cdb8d46954 training/dashboard(vaporwave): hide scanlines entirely — repeated moiré source 2026-05-08 00:09:34 -05:00
Max Gorog
34e579587c training/dashboard(vaporwave): move scanlines below sun so blinds don't moiré 2026-05-08 00:08:15 -05:00
Max Gorog
dc340b6462 training/dashboard(vaporwave): revert diagnostic bisect, restore floor/horizon/scanlines 2026-05-08 00:06:08 -05:00
Max Gorog
c55931f30a training/dashboard(vaporwave): diagnostic bisect — hide floor/horizon/scanlines 2026-05-08 00:05:36 -05:00
Max Gorog
cba10e27a5 training/dashboard: strip every stacking-context-creating prop from bg-canvas
The 'rendering over presentation elements' artifacts were from
piling stacking-context-creating properties on bg-canvas:
  - filter: blur(0px) — even at 0px creates a stacking context
    AND a 3D-flattening grouping property
  - transform: translateZ(0) — stacking context + 3D context
  - earlier: isolation, contain — stacking context + flattening

Each of these on its own can cause neighboring element artifacts
in Chromium (cutout-shaped opacity transition leaks, or in extreme
cases content rendering at the wrong z-order).

Strip everything. The bg-canvas is now just position:fixed with
overflow:hidden — that's enough for the bg layers it contains,
and the browser will GPU-promote it automatically when there's
real animation inside. The filter is now conditional: JS sets
--bg-filter to blur(Npx) only when the slider is non-zero, and
removes the custom property at zero so the rule falls through to
filter: none and no stacking context is created.
2026-05-08 00:02:18 -05:00
Max Gorog
841dcead6a training/dashboard(themes): strip overzealous layer isolation
The 'cutout mask' artifacts on foreground stage views came from
piling contain:paint + will-change + transform:translateZ all
together on .canvas-wrapper. Paint containment plus a
GPU-promoted layer on the foreground container breaks compositor
ordering when children's opacity transitions fire (the
IntersectionObserver fades stage-views in/out as scenes activate),
producing rectangular cutout-shaped artifacts where the
transitioning element's layer hadn't fully composited yet.

Strip canvas-wrapper back to the bare minimum (just position,
overflow, z-index). Also drop isolation/contain from bg-canvas,
keeping only transform: translateZ(0) for layer promotion — that
alone is enough to give bg-canvas its own compositor layer
without fighting the foreground's painting.
2026-05-07 23:59:38 -05:00
Max Gorog
243c1d019c training/dashboard(vaporwave): restore scanlines, sky-only, off-resonance, no blend
The scanlines were genuinely the source of the orthogonal-in-3D
artifact, in two compounding ways:

1. mix-blend-mode: multiply on a fullscreen overlay forces an
   isolation group: the browser composites the 3D-rotated floor
   into a flat 2D bitmap underneath the blend, and that
   flattening interacts badly with how Chromium rasterizes
   perspective transforms.

2. The 4px stripe period beats against the perspective floor's
   per-row line spacing (which is dense near the horizon and
   sparse near the viewer). At the screen y where the floor's
   row-spacing crosses 4px, the patterns interfere — producing
   moiré bands that look like a phantom grid orthogonal to the
   floor plane.

Fix: confine scanlines to the sky region above the horizon
(they never touch the perspective grid), drop the multiply
blend (regular alpha compositing), and use a 5px period that
avoids resonance with anything else in the scene.
2026-05-07 23:57:08 -05:00
Max Gorog
775930d35d training/dashboard(vaporwave): diagnostic — hide .vw-scanlines to test moiré hypothesis 2026-05-07 23:55:12 -05:00