CIS490/references/LogBERT: Log Anomaly Detection via BERT.md
Max Gorog 9e38f78379 training/dashboard(references): description sidebar + better space use
Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
2026-05-08 12:40:32 -05:00

26 lines
1.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Transformer pretraining for log anomaly detection
LogBERT trains **BERT-style masked-language-modeling on log
sequences** and uses the resulting representations for unsupervised
anomaly scoring. The closest published example of "BERT, but for
host telemetry."
## What we borrowed
- **The transformer entry in our model comparison.** LogBERT is the
citation for why a transformer is even in the model lineup on
scene 9 — it shows that attention over moderate-length log windows
has enough signal to separate normal from anomalous *without*
per-anomaly labels.
- **Pretraining + fine-tune split.** Their two-stage setup
(self-supervised pretrain on benign logs, downstream classifier
on labeled anomalies) is the template we follow when describing
the BERT model's training story on the *training-code* scene.
## Where it differs
- Logs are categorical (template tokens); our windows are dense
float vectors (12 channels × 100 samples). The BERT we run is the
same architecture but reads continuous-valued tokens, so the
masking objective is regression-on-masked-channels rather than
cross-entropy-on-masked-token.