The model layer of the project, built honestly:
- tools/dataset_validate.py — full-sweep validator over the receiver
store (sha256, schema, monotonic labels, telemetry-row gate). On the
current corpus: 64,798 accepted + 8,154 degraded + 3,701 rejected +
7 errored across 76,660 shipped episodes. data/processed/validation_v1.parquet
is committed as the per-episode acceptance index.
- training/_features.py — channel registry (46 channels across
proc/guest/qmp/netflow), summary-stat windowing AND channel×time
tensor extraction at 10s/5s windowing. Time alignment uses t_wall_ns
(Unix ns) — tested fix for a real netflow-vs-host clock-base
inconsistency that was silently dropping every netflow channel.
- training/_split.py — three held-out recipes (host / sample / time)
with profile-stratification assertions. held_out_host carries
untested_profiles for cases like scan-and-dial absent from the test
host (5 of 6 profiles tested cross-device, never silently averaged).
- training/models/ — 6 architectures behind a common BaseModel
interface: gbt (XGBoost), mlp, cnn, gru, lstm, transformer. Each
trained twice (realistic / oracle) per the deployment threat model.
Schema-hashed checkpoints refuse to load if _features.py changed
since training (silent-input-drift protection, tested).
- training/trainer/ — unified training loop: class-weighted CE, LR
warmup + cosine, gradient clipping, mixed precision when CUDA,
early stopping on val macro F1, best-on-val checkpoint. Same loop
runs MLP/CNN/GRU/LSTM/Transformer; GBT uses XGBoost
early_stopping_rounds on val mlogloss.
- training/eval_/ — bootstrap 95% CIs on macro F1, per-class F1,
per-profile and per-host breakdown, paired-bootstrap significance
for model-vs-model gap. Confusion matrix uses union of seen labels.
- training/dashboard/producers/ — replay/metrics/perf/profiles
emitting the six event types the dashboard's awaiting scenes
consume; on-demand tensor extraction so the Pi can run live
inference without 65 GB of shards.
- 17 unit tests (split coverage, features round-trip, schema mismatch,
determinism, time-base alignment regression).
End-to-end smoke-trained all six on a 567-episode subset; held-out
test macro F1 reported with paired-bootstrap significance. The
methodology now reports honest cross-device generalization, not
in-distribution validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
100 lines
3 KiB
Python
100 lines
3 KiB
Python
"""MLP on per-window summary features.
|
|
|
|
Apples-to-apples NN baseline against GBT — same input, different
|
|
inductive bias. Intentionally small (250 → 256 → 256 → n_classes) so
|
|
the parameter count stays comparable to a tree ensemble of similar
|
|
expressiveness.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
from typing import Any
|
|
|
|
import numpy as np
|
|
|
|
from training.models import register
|
|
from training.models._base import BaseModel, StandardizeStats
|
|
|
|
|
|
@register("mlp")
|
|
class MLP(BaseModel):
|
|
input_kind = "summary"
|
|
|
|
def __init__(
|
|
self,
|
|
*,
|
|
n_features_in: int,
|
|
n_classes: int,
|
|
keep_mask: np.ndarray,
|
|
standardize: StandardizeStats,
|
|
hidden: int = 256,
|
|
n_layers: int = 2,
|
|
dropout: float = 0.1,
|
|
device: str = "cpu",
|
|
) -> None:
|
|
import torch # noqa: F401
|
|
from torch import nn # noqa: F401
|
|
|
|
self._mod = self._build(
|
|
n_features_in=n_features_in,
|
|
n_classes=n_classes,
|
|
hidden=hidden,
|
|
n_layers=n_layers,
|
|
dropout=dropout,
|
|
).to(device)
|
|
self.n_classes = n_classes
|
|
self.keep_mask = keep_mask.astype(bool)
|
|
self.standardize = standardize
|
|
self.config = {
|
|
"hidden": hidden, "n_layers": n_layers, "dropout": dropout,
|
|
"n_features_in": n_features_in,
|
|
}
|
|
self._device = device
|
|
|
|
@staticmethod
|
|
def _build(*, n_features_in: int, n_classes: int, hidden: int,
|
|
n_layers: int, dropout: float):
|
|
from torch import nn
|
|
layers: list = [nn.Linear(n_features_in, hidden), nn.GELU(),
|
|
nn.Dropout(dropout)]
|
|
for _ in range(n_layers - 1):
|
|
layers += [nn.Linear(hidden, hidden), nn.GELU(),
|
|
nn.Dropout(dropout)]
|
|
layers.append(nn.Linear(hidden, n_classes))
|
|
return nn.Sequential(*layers)
|
|
|
|
@property
|
|
def module(self):
|
|
return self._mod
|
|
|
|
def predict_proba(self, X: np.ndarray) -> np.ndarray:
|
|
import torch
|
|
Xk = self.select(X)
|
|
self._mod.eval()
|
|
with torch.no_grad():
|
|
t = torch.from_numpy(Xk).to(self._device)
|
|
out = self._mod(t)
|
|
probs = torch.softmax(out, dim=-1).cpu().numpy()
|
|
return probs
|
|
|
|
def state_for_checkpoint(self) -> dict[str, Any]:
|
|
return {
|
|
"state_dict": self._mod.state_dict(),
|
|
"config": self.config,
|
|
}
|
|
|
|
@classmethod
|
|
def from_checkpoint(cls, header: dict, payload: dict, *,
|
|
device: str = "cpu") -> "MLP":
|
|
cfg = payload["config"]
|
|
m = cls(
|
|
n_features_in=cfg["n_features_in"],
|
|
n_classes=int(header["n_classes"]),
|
|
keep_mask=np.asarray(header["keep_mask"], dtype=bool),
|
|
standardize=StandardizeStats.from_dict(header["standardize"]),
|
|
hidden=cfg["hidden"], n_layers=cfg["n_layers"], dropout=cfg["dropout"],
|
|
device=device,
|
|
)
|
|
m._mod.load_state_dict(payload["state_dict"])
|
|
return m
|
|
|
|
|