training/dashboard(references): description sidebar + better space use

Two changes per the user's feedback that the slide had unused
horizontal space and needed per-PDF context.

Layout
- The reference scene is now a 2-column grid inside the
  metric-stack: PDF iframe at ~1.7fr on the left, description
  panel at ~0.55fr on the right (min 280px). On narrow viewports
  (<1100px) it falls back to a vertical stack with the
  description capped to 240px.
- Added #zoom=page-width to the iframe URL so the PDF's page
  fits its column width instead of leaving margins beside an
  8.5x11 page rendered in a wider iframe.
- Hide the prose card on the references scene — the description
  panel inside the stack covers what the prose was saying, and
  freeing the right edge gives the description proper room.

Description content
- Backend reads <stem>.md sidecar files alongside each PDF and
  returns the contents in the /api/references payload.
- Frontend renders them with a tiny built-in markdown subset
  (headings, bold/italic, lists, inline code, paragraphs) — no
  third-party renderer dependency.
- Initial draft sidecar .md files committed for the four PDFs
  currently in references/. Each describes how the paper informs
  a specific scene of the deck (which model row, which eval
  protocol, which channel selection). Edit them in place and the
  panel updates on the next reload.
This commit is contained in:
Max Gorog 2026-05-08 12:40:27 -05:00
parent 69c563275a
commit 9e38f78379
8 changed files with 232 additions and 19 deletions

View file

@ -0,0 +1,26 @@
# Closest direct precedent
This paper applies deep learning to **time-series system-call traces
inside virtual machines** for malware detection — almost exactly the
framing of this project, just one layer deeper in the stack
(syscall traces vs `/proc` samples).
## What we borrowed
- **Windowing strategy.** The paper's fixed-length sliding-window
formulation over a sequential telemetry stream is the same shape
we use for our 10-second `/proc` windows fed to LSTM/GRU/RNN.
- **Recurrent architecture as the simple-but-strong baseline.**
Their result that an LSTM on raw sequences beats hand-crafted
feature classifiers on the same data is the cited justification
for our LSTM/GRU/RNN row of the model comparison.
- **Per-VM containment posture.** Confirms our choice to run each
episode in its own throwaway Alpine guest rather than instrumenting
the host process directly.
## Where it differs
- Their telemetry is full **syscall traces** (much richer than
`/proc` resource counters), which is why their numbers don't
transfer 1-to-1 to our setup. They establish *that* this works;
we measure how well it works on a thinner, more deployable signal.

View file

@ -0,0 +1,26 @@
# LSTM on event-log sequences
DANTE applies a **plain LSTM directly to system-log event sequences**
to flag insider-threat behavior. Earlier in the literature than the
transformer wave, and useful here as a methodological baseline.
## What we borrowed
- **Evidence that simple recurrent models are enough.** The paper
shows an LSTM on sequence-of-events alone — no per-task feature
engineering — captures enough temporal structure to beat
bag-of-events classifiers. That's the empirical ground for the
*RNN/GRU/LSTM* entries in our model comparison being plain, not
bespoke.
- **Negative-evidence framing.** DANTE is also explicit about cases
where the LSTM under-performs (low-volume users, novel event
types). Informs the *split-by-sample, not split-by-time* eval
protocol on the perf scene — generalising to unseen actors is
the bar.
## Where it differs
- Operates on log-event token sequences (categorical), not numeric
resource metrics (continuous). Our channels are floats from
`/proc`, so we use the temporal structure DANTE validates without
inheriting the embedding setup.

View file

@ -0,0 +1,26 @@
# Transformer pretraining for log anomaly detection
LogBERT trains **BERT-style masked-language-modeling on log
sequences** and uses the resulting representations for unsupervised
anomaly scoring. The closest published example of "BERT, but for
host telemetry."
## What we borrowed
- **The transformer entry in our model comparison.** LogBERT is the
citation for why a transformer is even in the model lineup on
scene 9 — it shows that attention over moderate-length log windows
has enough signal to separate normal from anomalous *without*
per-anomaly labels.
- **Pretraining + fine-tune split.** Their two-stage setup
(self-supervised pretrain on benign logs, downstream classifier
on labeled anomalies) is the template we follow when describing
the BERT model's training story on the *training-code* scene.
## Where it differs
- Logs are categorical (template tokens); our windows are dense
float vectors (12 channels × 100 samples). The BERT we run is the
same architecture but reads continuous-valued tokens, so the
masking objective is regression-on-masked-channels rather than
cross-entropy-on-masked-token.

View file

@ -0,0 +1,30 @@
# Strongest published precedent for this exact setup
This paper applies **transformer architectures to per-process
resource-utilisation metrics** — the same shape of telemetry we
collect from `/proc`. Closest reference to "the project we're doing,
but already published."
## What we borrowed
- **Channel selection.** Their list of `/proc` channels overlaps
heavily with ours (`cpu_user_jiffies`, `cpu_sys_jiffies`,
`rss_bytes`, `io_*_bytes`, `voluntary_ctxsw`, `involuntary_ctxsw`,
page-fault counters). Our 12-channel selection is essentially
this set, validated.
- **Window-and-classify framing.** They confirm that a transformer
reading short windows of these counters beats per-window
hand-features fed to gradient-boosted trees. That is exactly the
comparison we run: KNN-on-features vs sequence-models-on-windows.
- **Held-out-sample evaluation.** They emphasise generalising to
*unseen* malware families, not unseen time-slices of the same
family. We adopt the same eval protocol on the perf scene.
## Where it differs
- They use a much larger corpus and run on commercial endpoints;
we run on three lab hosts and a Pi. Their numbers are an upper
bound on what we can hope to reproduce — they're the target, not
the floor.
- They don't publish their exact dataset, so the comparison is
architectural, not reproductive.

View file

@ -249,9 +249,21 @@ def make_app(
# for display and URL-encode for the path so the
# iframe can fetch /refs/<encoded-name>.
display_name = " ".join(p.stem.split())
# Sidecar markdown: <stem>.md alongside the PDF
# holds a free-form description of how the paper
# was used in the project. Optional — the
# frontend shows a placeholder if missing.
description = None
md_path = p.with_suffix(".md")
if md_path.is_file():
try:
description = md_path.read_text(encoding="utf-8")
except OSError:
log.warning("could not read sidecar %s", md_path)
items.append({
"name": display_name,
"path": "/refs/" + quote(p.name, safe=""),
"description": description,
})
except OSError:
log.exception("could not list references in %s", REFS_DIR)

View file

@ -286,8 +286,8 @@ body[data-theme="laser"] .bg-laser { display: block; }
to { transform: rotate(360deg); }
}
/* ─── References scene (PDF viewer + tab strip) ─────────────────────── */
.ref-stack { /* metric-stack-wide variant; let viewer take the height */
/* ─── References scene (PDF viewer + tab strip + description) ──────── */
.ref-stack { /* metric-stack-wide variant; let content area fill height */
height: 100%;
justify-content: flex-start;
}
@ -315,26 +315,78 @@ body[data-theme="laser"] .bg-laser { display: block; }
color: var(--accent); border-color: var(--accent);
background: var(--accent-soft);
}
.ref-viewer-wrap {
/* Two-column layout: PDF viewer on the left taking the larger
share, description panel on the right. The viewer's column
uses minmax(0, ) so the iframe won't blow out the grid when
the PDF reports a wide intrinsic size. */
.ref-content {
flex: 1 1 auto; min-height: 0;
display: grid;
grid-template-columns: minmax(0, 1.7fr) minmax(280px, 0.55fr);
gap: 14px;
}
.ref-viewer-wrap {
background: var(--bg-elev);
border: 1px solid var(--line); border-radius: 4px;
overflow: hidden;
min-height: 0;
}
.ref-viewer {
width: 100%; height: 100%;
min-height: clamp(360px, 70vh, 900px);
border: 0; display: block; background: var(--bg-elev);
}
.ref-description {
background: var(--bg-elev);
border: 1px solid var(--line); border-radius: 4px;
overflow-y: auto;
padding: 18px 22px;
font-size: 14px; line-height: 1.6;
color: var(--fg);
min-height: 0;
}
.ref-description h1, .ref-description h2 {
font-size: 15px; font-weight: 600; margin: 0 0 10px;
color: var(--fg);
}
.ref-description h3 { font-size: 13px; font-weight: 600; margin: 12px 0 4px; }
.ref-description p { margin: 0 0 10px; }
.ref-description ul,
.ref-description ol { margin: 0 0 10px; padding-left: 20px; }
.ref-description li { margin: 0 0 4px; }
.ref-description code {
font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
font-size: 0.9em; color: var(--accent);
background: var(--accent-soft); padding: 1px 5px; border-radius: 3px;
}
.ref-description strong { color: var(--fg); font-weight: 600; }
.ref-description em { color: var(--fg-dim); font-style: italic; }
.ref-description .awaiting {
color: var(--fg-mute); font-style: italic;
font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
font-size: 12px;
}
/* On narrow viewports stack vertically: PDF on top, description
below, capped to a sensible height so the PDF still gets room. */
@media (max-width: 1100px) {
.ref-content { grid-template-columns: 1fr; }
.ref-description { max-height: 240px; }
}
/* References scene wants more horizontal room than the default
metric scenes the PDF is the point. Drop the right padding
that reserves space for the prose column down to a small gutter,
so the iframe can stretch most of the way across. The prose card
still overlays the right edge with its feathered backdrop. */
that reserves space for the prose column. The prose for this
scene is hidden anyway (see below) so we can use the full width
for the PDF + description grid. */
.stage-view[data-view="references"] {
padding-right: clamp(8px, 4vw, 96px);
padding-right: clamp(8px, 2vw, 48px);
}
/* Hide the prose card on the references scene the description
panel inside the metric-stack already explains each PDF in
context, and freeing the right-side viewport gives the
description panel proper room. */
.scene[data-stage="references"] .prose { display: none; }
/* ─── Per-theme settings section ───────────────────────────────────── */
.theme-bg-section { display: none; }

View file

@ -1641,7 +1641,8 @@ for epoch in range(20):
(function () {
const tabsEl = document.getElementById('ref-tabs');
const viewerEl = document.getElementById('ref-viewer');
if (!tabsEl || !viewerEl) return;
const descEl = document.getElementById('ref-description');
if (!tabsEl || !viewerEl || !descEl) return;
let refs = [];
let activeIdx = -1;
@ -1661,6 +1662,7 @@ for epoch in range(20):
empty.className = 'awaiting';
empty.textContent = 'no PDFs found in /opt/cis490/references/';
tabsEl.appendChild(empty);
renderDescription(null);
return;
}
refs.forEach((r, i) => {
@ -1680,10 +1682,46 @@ for epoch in range(20):
if (i < 0 || i >= refs.length) return;
activeIdx = i;
rebuildTabs();
// Append a hash so that hitting the same PDF twice in a row
// still triggers a reload (helps if the file was updated on
// disk; iframes cache aggressively otherwise).
viewerEl.src = refs[i].path;
// #zoom=page-width forces the browser's PDF viewer to fit the
// page horizontally to the iframe — without it, an 8.5×11
// page leaves whitespace on either side when the iframe is
// wider than the page's natural width.
viewerEl.src = refs[i].path + '#zoom=page-width';
renderDescription(refs[i].description);
}
// Tiny markdown-ish renderer: enough to display headings,
// paragraphs, bold/italic, lists, inline code. Keeps this widget
// dependency-free (no marked.js / showdown.js / etc).
function renderDescription(md) {
if (!md) {
descEl.innerHTML =
'<p class="awaiting">no description for this reference yet — drop a sidecar &lt;stem&gt;.md next to the PDF in /opt/cis490/references/</p>';
return;
}
// Escape HTML first so user content can't inject markup.
let s = md.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
// Inline: bold, italic, code.
s = s.replace(/\*\*([^*]+)\*\*/g, '<strong>$1</strong>')
.replace(/(?<!\*)\*([^*\n]+)\*(?!\*)/g, '<em>$1</em>')
.replace(/`([^`\n]+)`/g, '<code>$1</code>');
// Block-level: split on blank lines, then handle headings + lists.
const blocks = s.split(/\n{2,}/).map(block => {
const stripped = block.trim();
if (!stripped) return '';
if (stripped.startsWith('# ')) return `<h2>${stripped.slice(2)}</h2>`;
if (stripped.startsWith('## ')) return `<h2>${stripped.slice(3)}</h2>`;
if (stripped.startsWith('### ')) return `<h3>${stripped.slice(4)}</h3>`;
const lines = stripped.split('\n');
if (lines.every(l => /^[-*]\s/.test(l))) {
return '<ul>' + lines.map(l => `<li>${l.replace(/^[-*]\s+/, '')}</li>`).join('') + '</ul>';
}
if (lines.every(l => /^\d+\.\s/.test(l))) {
return '<ol>' + lines.map(l => `<li>${l.replace(/^\d+\.\s+/, '')}</li>`).join('') + '</ol>';
}
return `<p>${stripped.replace(/\n/g, '<br>')}</p>`;
});
descEl.innerHTML = blocks.join('');
}
fetch('/api/references')

View file

@ -4,7 +4,7 @@
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>CIS490 — live</title>
<link rel="stylesheet" href="/static/dashboard.css?v=a591789b">
<link rel="stylesheet" href="/static/dashboard.css?v=afecfcf3">
</head>
<body>
<!-- SVG filter defs for the lava-lamp goo effect. Width/height 0
@ -301,16 +301,19 @@
</div>
</div>
<!-- 13. references — PDF viewer with tabs -->
<!-- 13. references — PDF viewer with tabs + description -->
<div class="stage-view" data-view="references">
<div class="metric-stack metric-stack-wide ref-stack">
<div class="metric-eyebrow">references · papers, notes, prior work</div>
<div class="ref-tabs" id="ref-tabs"></div>
<div class="ref-content">
<div class="ref-viewer-wrap">
<iframe class="ref-viewer" id="ref-viewer"
title="reference viewer"
sandbox="allow-same-origin allow-scripts allow-popups allow-forms"></iframe>
</div>
<div class="ref-description" id="ref-description"></div>
</div>
</div>
</div>
@ -515,6 +518,6 @@
</article>
</div>
<script src="/static/dashboard.js?v=b1cb9f39"></script>
<script src="/static/dashboard.js?v=f2a8bda2"></script>
</body>
</html>