CIS490/references/DANTE: Predicting Insider Threat using LSTM on system logs.md
Max Gorog 9e38f78379 training/dashboard(references): description sidebar + better space use
Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
2026-05-08 12:40:32 -05:00

26 lines
1.1 KiB
Markdown

# LSTM on event-log sequences
DANTE applies a **plain LSTM directly to system-log event sequences**
to flag insider-threat behavior. Earlier in the literature than the
transformer wave, and useful here as a methodological baseline.
## What we borrowed
- **Evidence that simple recurrent models are enough.** The paper
shows an LSTM on sequence-of-events alone — no per-task feature
engineering — captures enough temporal structure to beat
bag-of-events classifiers. That's the empirical ground for the
*RNN/GRU/LSTM* entries in our model comparison being plain, not
bespoke.
- **Negative-evidence framing.** DANTE is also explicit about cases
where the LSTM under-performs (low-volume users, novel event
types). Informs the *split-by-sample, not split-by-time* eval
protocol on the perf scene — generalising to unseen actors is
the bar.
## Where it differs
- Operates on log-event token sequences (categorical), not numeric
resource metrics (continuous). Our channels are floats from
`/proc`, so we use the temporal structure DANTE validates without
inheriting the embedding setup.