CIS490/references/DANTE: Predicting Insider Threat using LSTM on system logs.md
Max Gorog 9e38f78379 training/dashboard(references): description sidebar + better space use
Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
2026-05-08 12:40:32 -05:00

1.1 KiB

LSTM on event-log sequences

DANTE applies a plain LSTM directly to system-log event sequences to flag insider-threat behavior. Earlier in the literature than the transformer wave, and useful here as a methodological baseline.

What we borrowed

  • Evidence that simple recurrent models are enough. The paper shows an LSTM on sequence-of-events alone — no per-task feature engineering — captures enough temporal structure to beat bag-of-events classifiers. That's the empirical ground for the RNN/GRU/LSTM entries in our model comparison being plain, not bespoke.
  • Negative-evidence framing. DANTE is also explicit about cases where the LSTM under-performs (low-volume users, novel event types). Informs the split-by-sample, not split-by-time eval protocol on the perf scene — generalising to unseen actors is the bar.

Where it differs

  • Operates on log-event token sequences (categorical), not numeric resource metrics (continuous). Our channels are floats from /proc, so we use the temporal structure DANTE validates without inheriting the embedding setup.