CIS490/references/A Deep Learning Model Leveraging Time‑Series System Call Data to Detect Malware Attacks in Virtual Machines.md
Max Gorog 9e38f78379 training/dashboard(references): description sidebar + better space use
Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
2026-05-08 12:40:32 -05:00

1.2 KiB

Closest direct precedent

This paper applies deep learning to time-series system-call traces inside virtual machines for malware detection — almost exactly the framing of this project, just one layer deeper in the stack (syscall traces vs /proc samples).

What we borrowed

  • Windowing strategy. The paper's fixed-length sliding-window formulation over a sequential telemetry stream is the same shape we use for our 10-second /proc windows fed to LSTM/GRU/RNN.
  • Recurrent architecture as the simple-but-strong baseline. Their result that an LSTM on raw sequences beats hand-crafted feature classifiers on the same data is the cited justification for our LSTM/GRU/RNN row of the model comparison.
  • Per-VM containment posture. Confirms our choice to run each episode in its own throwaway Alpine guest rather than instrumenting the host process directly.

Where it differs

  • Their telemetry is full syscall traces (much richer than /proc resource counters), which is why their numbers don't transfer 1-to-1 to our setup. They establish that this works; we measure how well it works on a thinner, more deployable signal.