CIS490/training/models/gru.py
Max 1fabd4a246 training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers
The model layer of the project, built honestly:

  - tools/dataset_validate.py — full-sweep validator over the receiver
    store (sha256, schema, monotonic labels, telemetry-row gate). On the
    current corpus: 64,798 accepted + 8,154 degraded + 3,701 rejected +
    7 errored across 76,660 shipped episodes. data/processed/validation_v1.parquet
    is committed as the per-episode acceptance index.

  - training/_features.py — channel registry (46 channels across
    proc/guest/qmp/netflow), summary-stat windowing AND channel×time
    tensor extraction at 10s/5s windowing. Time alignment uses t_wall_ns
    (Unix ns) — tested fix for a real netflow-vs-host clock-base
    inconsistency that was silently dropping every netflow channel.

  - training/_split.py — three held-out recipes (host / sample / time)
    with profile-stratification assertions. held_out_host carries
    untested_profiles for cases like scan-and-dial absent from the test
    host (5 of 6 profiles tested cross-device, never silently averaged).

  - training/models/ — 6 architectures behind a common BaseModel
    interface: gbt (XGBoost), mlp, cnn, gru, lstm, transformer. Each
    trained twice (realistic / oracle) per the deployment threat model.
    Schema-hashed checkpoints refuse to load if _features.py changed
    since training (silent-input-drift protection, tested).

  - training/trainer/ — unified training loop: class-weighted CE, LR
    warmup + cosine, gradient clipping, mixed precision when CUDA,
    early stopping on val macro F1, best-on-val checkpoint. Same loop
    runs MLP/CNN/GRU/LSTM/Transformer; GBT uses XGBoost
    early_stopping_rounds on val mlogloss.

  - training/eval_/ — bootstrap 95% CIs on macro F1, per-class F1,
    per-profile and per-host breakdown, paired-bootstrap significance
    for model-vs-model gap. Confusion matrix uses union of seen labels.

  - training/dashboard/producers/ — replay/metrics/perf/profiles
    emitting the six event types the dashboard's awaiting scenes
    consume; on-demand tensor extraction so the Pi can run live
    inference without 65 GB of shards.

  - 17 unit tests (split coverage, features round-trip, schema mismatch,
    determinism, time-base alignment regression).

End-to-end smoke-trained all six on a 567-episode subset; held-out
test macro F1 reported with paired-bootstrap significance. The
methodology now reports honest cross-device generalization, not
in-distribution validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:19:00 -05:00

41 lines
1.7 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Gated Recurrent Unit over channel × time windows.
Sees the window one timestep at a time and accumulates state. Cheaper
than LSTM, often comparable on short sequences. Last-step output → linear.
"""
from __future__ import annotations
from training.models import register
from training.models._torch_seq import _SeqBase
@register("gru")
class GRU(_SeqBase):
def _build_module(self, *, n_channels_in: int, n_timesteps: int,
n_classes: int, hidden: int = 128, n_layers: int = 2,
dropout: float = 0.1, bidirectional: bool = False):
from torch import nn
return _GRUClassifier(n_channels_in=n_channels_in, n_classes=n_classes,
hidden=hidden, n_layers=n_layers,
dropout=dropout, bidirectional=bidirectional)
from torch import nn # noqa: E402
class _GRUClassifier(nn.Module):
def __init__(self, *, n_channels_in: int, n_classes: int, hidden: int,
n_layers: int, dropout: float, bidirectional: bool):
super().__init__()
self.gru = nn.GRU(
input_size=n_channels_in, hidden_size=hidden,
num_layers=n_layers, dropout=dropout if n_layers > 1 else 0.0,
batch_first=True, bidirectional=bidirectional,
)
d_out = hidden * (2 if bidirectional else 1)
self.head = nn.Sequential(nn.Dropout(dropout), nn.Linear(d_out, n_classes))
def forward(self, x): # x: (B, C, T)
x = x.transpose(1, 2) # → (B, T, C)
out, _ = self.gru(x) # (B, T, hidden*dirs)
return self.head(out[:, -1, :]) # last timestep