training/dashboard(references): description sidebar + better space use

Two changes per the user's feedback that the slide had unused horizontal space and needed per-PDF context. Layout - The reference scene is now a 2-column grid inside the metric-stack: PDF iframe at ~1.7fr on the left, description panel at ~0.55fr on the right (min 280px). On narrow viewports (<1100px) it falls back to a vertical stack with the description capped to 240px. - Added #zoom=page-width to the iframe URL so the PDF's page fits its column width instead of leaving margins beside an 8.5x11 page rendered in a wider iframe. - Hide the prose card on the references scene — the description panel inside the stack covers what the prose was saying, and freeing the right edge gives the description proper room. Description content - Backend reads <stem>.md sidecar files alongside each PDF and returns the contents in the /api/references payload. - Frontend renders them with a tiny built-in markdown subset (headings, bold/italic, lists, inline code, paragraphs) — no third-party renderer dependency. - Initial draft sidecar .md files committed for the four PDFs currently in references/. Each describes how the paper informs a specific scene of the deck (which model row, which eval protocol, which channel selection). Edit them in place and the panel updates on the next reload.
2026-05-08 12:40:27 -05:00 · 2026-05-08 12:40:27 -05:00 · 9e38f78379
commit 9e38f78379
parent 69c563275a
8 changed files with 232 additions and 19 deletions
--- a/references/A
+++ b/references/A
@ -0,0 +1,26 @@
+# Closest direct precedent
+
+This paper applies deep learning to **time-series system-call traces
+inside virtual machines** for malware detection — almost exactly the
+framing of this project, just one layer deeper in the stack
+(syscall traces vs `/proc` samples).
+
+## What we borrowed
+
+- **Windowing strategy.** The paper's fixed-length sliding-window
+  formulation over a sequential telemetry stream is the same shape
+  we use for our 10-second `/proc` windows fed to LSTM/GRU/RNN.
+- **Recurrent architecture as the simple-but-strong baseline.**
+  Their result that an LSTM on raw sequences beats hand-crafted
+  feature classifiers on the same data is the cited justification
+  for our LSTM/GRU/RNN row of the model comparison.
+- **Per-VM containment posture.** Confirms our choice to run each
+  episode in its own throwaway Alpine guest rather than instrumenting
+  the host process directly.
+
+## Where it differs
+
+- Their telemetry is full **syscall traces** (much richer than
+  `/proc` resource counters), which is why their numbers don't
+  transfer 1-to-1 to our setup. They establish *that* this works;
+  we measure how well it works on a thinner, more deployable signal.
--- a/references/DANTE:
+++ b/references/DANTE:
@ -0,0 +1,26 @@
+# LSTM on event-log sequences
+
+DANTE applies a **plain LSTM directly to system-log event sequences**
+to flag insider-threat behavior. Earlier in the literature than the
+transformer wave, and useful here as a methodological baseline.
+
+## What we borrowed
+
+- **Evidence that simple recurrent models are enough.** The paper
+  shows an LSTM on sequence-of-events alone — no per-task feature
+  engineering — captures enough temporal structure to beat
+  bag-of-events classifiers. That's the empirical ground for the
+  *RNN/GRU/LSTM* entries in our model comparison being plain, not
+  bespoke.
+- **Negative-evidence framing.** DANTE is also explicit about cases
+  where the LSTM under-performs (low-volume users, novel event
+  types). Informs the *split-by-sample, not split-by-time* eval
+  protocol on the perf scene — generalising to unseen actors is
+  the bar.
+
+## Where it differs
+
+- Operates on log-event token sequences (categorical), not numeric
+  resource metrics (continuous). Our channels are floats from
+  `/proc`, so we use the temporal structure DANTE validates without
+  inheriting the embedding setup.
--- a/references/LogBERT:
+++ b/references/LogBERT:
@ -0,0 +1,26 @@
+# Transformer pretraining for log anomaly detection
+
+LogBERT trains **BERT-style masked-language-modeling on log
+sequences** and uses the resulting representations for unsupervised
+anomaly scoring. The closest published example of "BERT, but for
+host telemetry."
+
+## What we borrowed
+
+- **The transformer entry in our model comparison.** LogBERT is the
+  citation for why a transformer is even in the model lineup on
+  scene 9 — it shows that attention over moderate-length log windows
+  has enough signal to separate normal from anomalous *without*
+  per-anomaly labels.
+- **Pretraining + fine-tune split.** Their two-stage setup
+  (self-supervised pretrain on benign logs, downstream classifier
+  on labeled anomalies) is the template we follow when describing
+  the BERT model's training story on the *training-code* scene.
+
+## Where it differs
+
+- Logs are categorical (template tokens); our windows are dense
+  float vectors (12 channels × 100 samples). The BERT we run is the
+  same architecture but reads continuous-valued tokens, so the
+  masking objective is regression-on-masked-channels rather than
+  cross-entropy-on-masked-token.
--- a/references/Transformer-based
+++ b/references/Transformer-based
@ -0,0 +1,30 @@
+# Strongest published precedent for this exact setup
+
+This paper applies **transformer architectures to per-process
+resource-utilisation metrics** — the same shape of telemetry we
+collect from `/proc`. Closest reference to "the project we're doing,
+but already published."
+
+## What we borrowed
+
+- **Channel selection.** Their list of `/proc` channels overlaps
+  heavily with ours (`cpu_user_jiffies`, `cpu_sys_jiffies`,
+  `rss_bytes`, `io_*_bytes`, `voluntary_ctxsw`, `involuntary_ctxsw`,
+  page-fault counters). Our 12-channel selection is essentially
+  this set, validated.
+- **Window-and-classify framing.** They confirm that a transformer
+  reading short windows of these counters beats per-window
+  hand-features fed to gradient-boosted trees. That is exactly the
+  comparison we run: KNN-on-features vs sequence-models-on-windows.
+- **Held-out-sample evaluation.** They emphasise generalising to
+  *unseen* malware families, not unseen time-slices of the same
+  family. We adopt the same eval protocol on the perf scene.
+
+## Where it differs
+
+- They use a much larger corpus and run on commercial endpoints;
+  we run on three lab hosts and a Pi. Their numbers are an upper
+  bound on what we can hope to reproduce — they're the target, not
+  the floor.
+- They don't publish their exact dataset, so the comparison is
+  architectural, not reproductive.
--- a/training/dashboard/app.py
+++ b/training/dashboard/app.py
@ -249,9 +249,21 @@ def make_app(
                # for display and URL-encode for the path so the
                # iframe can fetch /refs/<encoded-name>.
                display_name = " ".join(p.stem.split())
+                # Sidecar markdown: <stem>.md alongside the PDF
+                # holds a free-form description of how the paper
+                # was used in the project. Optional — the
+                # frontend shows a placeholder if missing.
+                description = None
+                md_path = p.with_suffix(".md")
+                if md_path.is_file():
+                    try:
+                        description = md_path.read_text(encoding="utf-8")
+                    except OSError:
+                        log.warning("could not read sidecar %s", md_path)
                items.append({
                    "name": display_name,
                    "path": "/refs/" + quote(p.name, safe=""),
+                    "description": description,
                })
        except OSError:
            log.exception("could not list references in %s", REFS_DIR)
--- a/training/dashboard/static/dashboard.css
+++ b/training/dashboard/static/dashboard.css
@ -286,8 +286,8 @@ body[data-theme="laser"] .bg-laser         { display: block; }
  to   { transform: rotate(360deg); }
 }

-/* ─── References scene (PDF viewer + tab strip) ─────────────────────── */
-.ref-stack { /* metric-stack-wide variant; let viewer take the height */
+/* ─── References scene (PDF viewer + tab strip + description) ──────── */
+.ref-stack { /* metric-stack-wide variant; let content area fill height */
  height: 100%;
  justify-content: flex-start;
 }
@ -315,26 +315,78 @@ body[data-theme="laser"] .bg-laser         { display: block; }
  color: var(--accent); border-color: var(--accent);
  background: var(--accent-soft);
 }
-.ref-viewer-wrap {
+/* Two-column layout: PDF viewer on the left taking the larger
+   share, description panel on the right. The viewer's column
+   uses minmax(0, …) so the iframe won't blow out the grid when
+   the PDF reports a wide intrinsic size. */
+.ref-content {
  flex: 1 1 auto; min-height: 0;
+  display: grid;
+  grid-template-columns: minmax(0, 1.7fr) minmax(280px, 0.55fr);
+  gap: 14px;
+}
+.ref-viewer-wrap {
  background: var(--bg-elev);
  border: 1px solid var(--line); border-radius: 4px;
  overflow: hidden;
+  min-height: 0;
 }
 .ref-viewer {
  width: 100%; height: 100%;
  min-height: clamp(360px, 70vh, 900px);
  border: 0; display: block; background: var(--bg-elev);
 }
+.ref-description {
+  background: var(--bg-elev);
+  border: 1px solid var(--line); border-radius: 4px;
+  overflow-y: auto;
+  padding: 18px 22px;
+  font-size: 14px; line-height: 1.6;
+  color: var(--fg);
+  min-height: 0;
+}
+.ref-description h1, .ref-description h2 {
+  font-size: 15px; font-weight: 600; margin: 0 0 10px;
+  color: var(--fg);
+}
+.ref-description h3 { font-size: 13px; font-weight: 600; margin: 12px 0 4px; }
+.ref-description p  { margin: 0 0 10px; }
+.ref-description ul,
+.ref-description ol { margin: 0 0 10px; padding-left: 20px; }
+.ref-description li { margin: 0 0 4px; }
+.ref-description code {
+  font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
+  font-size: 0.9em; color: var(--accent);
+  background: var(--accent-soft); padding: 1px 5px; border-radius: 3px;
+}
+.ref-description strong { color: var(--fg); font-weight: 600; }
+.ref-description em     { color: var(--fg-dim); font-style: italic; }
+.ref-description .awaiting {
+  color: var(--fg-mute); font-style: italic;
+  font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
+  font-size: 12px;
+}
+
+/* On narrow viewports stack vertically: PDF on top, description
+   below, capped to a sensible height so the PDF still gets room. */
+@media (max-width: 1100px) {
+  .ref-content { grid-template-columns: 1fr; }
+  .ref-description { max-height: 240px; }
+}

 /* References scene wants more horizontal room than the default
   metric scenes — the PDF is the point. Drop the right padding
-   that reserves space for the prose column down to a small gutter,
-   so the iframe can stretch most of the way across. The prose card
-   still overlays the right edge with its feathered backdrop. */
+   that reserves space for the prose column. The prose for this
+   scene is hidden anyway (see below) so we can use the full width
+   for the PDF + description grid. */
 .stage-view[data-view="references"] {
-  padding-right: clamp(8px, 4vw, 96px);
+  padding-right: clamp(8px, 2vw, 48px);
 }
+/* Hide the prose card on the references scene — the description
+   panel inside the metric-stack already explains each PDF in
+   context, and freeing the right-side viewport gives the
+   description panel proper room. */
+.scene[data-stage="references"] .prose { display: none; }

 /* ─── Per-theme settings section ───────────────────────────────────── */
 .theme-bg-section { display: none; }
--- a/training/dashboard/static/dashboard.js
+++ b/training/dashboard/static/dashboard.js
@ -1641,7 +1641,8 @@ for epoch in range(20):
  (function () {
    const tabsEl = document.getElementById('ref-tabs');
    const viewerEl = document.getElementById('ref-viewer');
-    if (!tabsEl || !viewerEl) return;
+    const descEl = document.getElementById('ref-description');
+    if (!tabsEl || !viewerEl || !descEl) return;

    let refs = [];
    let activeIdx = -1;
@ -1661,6 +1662,7 @@ for epoch in range(20):
        empty.className = 'awaiting';
        empty.textContent = 'no PDFs found in /opt/cis490/references/';
        tabsEl.appendChild(empty);
+        renderDescription(null);
        return;
      }
      refs.forEach((r, i) => {
@ -1680,10 +1682,46 @@ for epoch in range(20):
      if (i < 0 || i >= refs.length) return;
      activeIdx = i;
      rebuildTabs();
-      // Append a hash so that hitting the same PDF twice in a row
-      // still triggers a reload (helps if the file was updated on
-      // disk; iframes cache aggressively otherwise).
-      viewerEl.src = refs[i].path;
+      // #zoom=page-width forces the browser's PDF viewer to fit the
+      // page horizontally to the iframe — without it, an 8.5×11
+      // page leaves whitespace on either side when the iframe is
+      // wider than the page's natural width.
+      viewerEl.src = refs[i].path + '#zoom=page-width';
+      renderDescription(refs[i].description);
+    }
+
+    // Tiny markdown-ish renderer: enough to display headings,
+    // paragraphs, bold/italic, lists, inline code. Keeps this widget
+    // dependency-free (no marked.js / showdown.js / etc).
+    function renderDescription(md) {
+      if (!md) {
+        descEl.innerHTML =
+          '<p class="awaiting">no description for this reference yet — drop a sidecar &lt;stem&gt;.md next to the PDF in /opt/cis490/references/</p>';
+        return;
+      }
+      // Escape HTML first so user content can't inject markup.
+      let s = md.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
+      // Inline: bold, italic, code.
+      s = s.replace(/\*\*([^*]+)\*\*/g, '<strong>$1</strong>')
+           .replace(/(?<!\*)\*([^*\n]+)\*(?!\*)/g, '<em>$1</em>')
+           .replace(/`([^`\n]+)`/g, '<code>$1</code>');
+      // Block-level: split on blank lines, then handle headings + lists.
+      const blocks = s.split(/\n{2,}/).map(block => {
+        const stripped = block.trim();
+        if (!stripped) return '';
+        if (stripped.startsWith('# '))   return `<h2>${stripped.slice(2)}</h2>`;
+        if (stripped.startsWith('## '))  return `<h2>${stripped.slice(3)}</h2>`;
+        if (stripped.startsWith('### ')) return `<h3>${stripped.slice(4)}</h3>`;
+        const lines = stripped.split('\n');
+        if (lines.every(l => /^[-*]\s/.test(l))) {
+          return '<ul>' + lines.map(l => `<li>${l.replace(/^[-*]\s+/, '')}</li>`).join('') + '</ul>';
+        }
+        if (lines.every(l => /^\d+\.\s/.test(l))) {
+          return '<ol>' + lines.map(l => `<li>${l.replace(/^\d+\.\s+/, '')}</li>`).join('') + '</ol>';
+        }
+        return `<p>${stripped.replace(/\n/g, '<br>')}</p>`;
+      });
+      descEl.innerHTML = blocks.join('');
    }

    fetch('/api/references')
--- a/training/dashboard/static/index.html
+++ b/training/dashboard/static/index.html
@ -4,7 +4,7 @@
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>CIS490 — live</title>
-  <link rel="stylesheet" href="/static/dashboard.css?v=a591789b">
+  <link rel="stylesheet" href="/static/dashboard.css?v=afecfcf3">
 </head>
 <body>
  <!-- SVG filter defs for the lava-lamp goo effect. Width/height 0
@ -301,16 +301,19 @@
          </div>
        </div>

-        <!-- 13. references — PDF viewer with tabs -->
+        <!-- 13. references — PDF viewer with tabs + description -->
        <div class="stage-view" data-view="references">
          <div class="metric-stack metric-stack-wide ref-stack">
            <div class="metric-eyebrow">references · papers, notes, prior work</div>
            <div class="ref-tabs" id="ref-tabs"></div>
+            <div class="ref-content">
              <div class="ref-viewer-wrap">
                <iframe class="ref-viewer" id="ref-viewer"
                        title="reference viewer"
                        sandbox="allow-same-origin allow-scripts allow-popups allow-forms"></iframe>
              </div>
+              <div class="ref-description" id="ref-description"></div>
+            </div>
          </div>
        </div>

@ -515,6 +518,6 @@
    </article>
  </div>

-  <script src="/static/dashboard.js?v=b1cb9f39"></script>
+  <script src="/static/dashboard.js?v=f2a8bda2"></script>
 </body>
 </html>