CIS490/references/LogBERT: Log Anomaly Detection via BERT.md
Max Gorog 9e38f78379 training/dashboard(references): description sidebar + better space use
Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
2026-05-08 12:40:32 -05:00

1.1 KiB
Raw Permalink Blame History

Transformer pretraining for log anomaly detection

LogBERT trains BERT-style masked-language-modeling on log sequences and uses the resulting representations for unsupervised anomaly scoring. The closest published example of "BERT, but for host telemetry."

What we borrowed

  • The transformer entry in our model comparison. LogBERT is the citation for why a transformer is even in the model lineup on scene 9 — it shows that attention over moderate-length log windows has enough signal to separate normal from anomalous without per-anomaly labels.
  • Pretraining + fine-tune split. Their two-stage setup (self-supervised pretrain on benign logs, downstream classifier on labeled anomalies) is the template we follow when describing the BERT model's training story on the training-code scene.

Where it differs

  • Logs are categorical (template tokens); our windows are dense float vectors (12 channels × 100 samples). The BERT we run is the same architecture but reads continuous-valued tokens, so the masking objective is regression-on-masked-channels rather than cross-entropy-on-masked-token.