The CSS-rule-per-canonical-name approach was wrong: any name the producer publishes that wasn't in the hardcoded list (mlp_realistic, cnn_oracle, knn_semi, anything new tomorrow) rendered grey because no .model-fill.<name> rule matched. Replace with a deterministic FNV-1a hash of the model string → hue, applied inline as an OKLCH gradient when the row is created. Every model string gets a stable, distinct color regardless of suffix or case. Inline style beats any CSS rule, so this works whatever's in dashboard.css. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
653 lines
30 KiB
HTML
653 lines
30 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
<title>CIS490 — live</title>
|
||
<link rel="stylesheet" href="/static/dashboard.css?v=ee34d60f">
|
||
</head>
|
||
<body>
|
||
<!-- SVG filter defs for the lava-lamp goo effect. Width/height 0
|
||
so it doesn't take layout space; the filter is referenced by
|
||
CSS via filter: url(#goo). -->
|
||
<svg class="goo-defs" width="0" height="0" aria-hidden="true">
|
||
<defs>
|
||
<filter id="goo">
|
||
<feGaussianBlur in="SourceGraphic" stdDeviation="22" result="blur"/>
|
||
<feColorMatrix in="blur" mode="matrix" result="goo" values="
|
||
1 0 0 0 0
|
||
0 1 0 0 0
|
||
0 0 1 0 0
|
||
0 0 0 26 -12"/>
|
||
<feBlend in="SourceGraphic" in2="goo"/>
|
||
</filter>
|
||
</defs>
|
||
</svg>
|
||
|
||
<!-- Theme background layers — exactly one is visible at a time,
|
||
selected by body[data-theme]. The blobs / bubbles / beams
|
||
inside drift / lava / laser are generated by JS so the count
|
||
and statistical-distribution sliders actually take effect. -->
|
||
<div class="bg-canvas" id="bg-canvas" aria-hidden="true">
|
||
<div class="bg-tint"></div>
|
||
|
||
<div class="bg-drift" id="bg-drift"></div>
|
||
|
||
<div class="bg-lava">
|
||
<div class="goo-container" id="bg-lava-bubbles"></div>
|
||
</div>
|
||
|
||
<div class="bg-vaporwave">
|
||
<div class="vw-sky"></div>
|
||
<!-- Scanlines BEFORE sun: the sun's solid disc occludes
|
||
scanlines inside its area so they can't beat against the
|
||
sun's venetian-blind stripes (the same kind of moiré
|
||
that previously appeared between scanlines and the
|
||
perspective floor — same shape, smaller scale). -->
|
||
<div class="vw-scanlines"></div>
|
||
<div class="vw-sun"><div class="vw-sun-blinds"></div></div>
|
||
<div class="vw-horizon"></div>
|
||
<div class="vw-floor"><div class="vw-floor-grid"></div></div>
|
||
</div>
|
||
|
||
<div class="bg-laser" id="bg-laser-beams"></div>
|
||
</div>
|
||
|
||
<!-- Right-half sidebar theme panel. Slides in/out via the
|
||
`is-open` class — we don't use the `hidden` attribute because
|
||
the transform animation needs the panel to stay rendered. -->
|
||
<div class="theme-panel" id="theme-panel">
|
||
<div class="theme-panel-header">
|
||
<span class="theme-title">theme · OKLCH</span>
|
||
<button id="theme-close" class="ghost icon" title="Close (t)">×</button>
|
||
</div>
|
||
|
||
<label class="theme-row">
|
||
<span>background</span>
|
||
<select id="theme-bg">
|
||
<option value="black">black (still)</option>
|
||
<option value="drift">drift (soft blobs)</option>
|
||
<option value="lava">lava lamp (goo metaballs)</option>
|
||
<option value="vaporwave">vaporwave</option>
|
||
<option value="laser">laser show</option>
|
||
</select>
|
||
</label>
|
||
|
||
<div class="theme-wheel-block">
|
||
<div class="theme-wheel" id="theme-wheel">
|
||
<div class="wheel-disc"></div>
|
||
<div class="wheel-rim"></div>
|
||
<div class="wheel-markers" id="wheel-markers"></div>
|
||
</div>
|
||
<div class="theme-sliders">
|
||
<label>L · lightness · <span id="theme-l-val">70</span>%
|
||
<input type="range" id="theme-l" min="20" max="95" value="70" step="1"></label>
|
||
<label>C · chroma · <span id="theme-c-val">0.15</span>
|
||
<input type="range" id="theme-c" min="0" max="0.4" value="0.15" step="0.005"></label>
|
||
<label>H · hue · <span id="theme-h-val">250</span>°
|
||
<input type="range" id="theme-h" min="0" max="360" value="250" step="1"></label>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="theme-sliders theme-harmony-block">
|
||
<label>colors · count · <span id="theme-count-val">3</span>
|
||
<input type="range" id="theme-count" min="1" max="6" value="3" step="1"></label>
|
||
<label>spread · angular range · <span id="theme-spread-val">60</span>°
|
||
<input type="range" id="theme-spread" min="0" max="300" value="60" step="1"></label>
|
||
<div class="theme-harmony-hint" id="theme-harmony-hint"></div>
|
||
</div>
|
||
|
||
<details class="theme-advanced">
|
||
<summary>advanced — palette ladder</summary>
|
||
<div class="theme-sliders">
|
||
<label>L variance · per-color lightness ladder · <span id="theme-lvar-val">0</span>
|
||
<input type="range" id="theme-lvar" min="0" max="40" value="0" step="1"></label>
|
||
<label>C variance · per-color chroma ladder · <span id="theme-cvar-val">0.00</span>
|
||
<input type="range" id="theme-cvar" min="0" max="0.15" value="0" step="0.005"></label>
|
||
</div>
|
||
</details>
|
||
|
||
<div class="theme-row">
|
||
<span>palette</span>
|
||
<div class="theme-swatches" id="theme-swatches"></div>
|
||
</div>
|
||
|
||
<details class="theme-advanced" open>
|
||
<summary>animation · global</summary>
|
||
<div class="theme-sliders">
|
||
<label>speed · <span id="theme-speed-val">1.00</span>×
|
||
<input type="range" id="theme-speed" min="0.1" max="4" value="1" step="0.05"></label>
|
||
<label>blur · <span id="theme-blur-val">0</span> px
|
||
<input type="range" id="theme-blur" min="0" max="40" value="0" step="1"></label>
|
||
<label>tint strength · <span id="theme-tint-val">0.10</span>
|
||
<input type="range" id="theme-tint" min="0" max="0.6" value="0.1" step="0.02"></label>
|
||
<label>content backdrop · <span id="theme-backdrop-val">0.30</span>
|
||
<input type="range" id="theme-backdrop" min="0" max="1" value="0.3" step="0.05"></label>
|
||
</div>
|
||
</details>
|
||
|
||
<!-- Per-theme settings — dynamically built by JS from the THEMES
|
||
spec; only the section matching state.background is shown. -->
|
||
<div id="theme-bg-settings"></div>
|
||
|
||
<div class="theme-meta-row">
|
||
<code id="theme-meta">oklch(70% 0.15 250)</code>
|
||
<button id="theme-reset" class="ghost">reset</button>
|
||
</div>
|
||
</div>
|
||
|
||
<header class="topbar">
|
||
<span class="brand">CIS490</span>
|
||
<span id="status" class="status">connecting…</span>
|
||
<span class="spacer"></span>
|
||
<span class="counter"><span id="scene-idx">1</span> / <span id="scene-total">1</span></span>
|
||
<button id="prev-btn" class="ghost icon" title="Previous (← / k)">◀</button>
|
||
<button id="next-btn" class="ghost icon" title="Next (→ / space / j)">▶</button>
|
||
<button id="click-nav-btn" class="ghost" title="Click on the stage to advance to the next slide (c)">click-nav: off</button>
|
||
<button id="demo-btn" class="ghost" title="Toggle local synthetic data (d)">demo: off</button>
|
||
<button id="theme-btn" class="ghost" title="Theme panel (t)">theme</button>
|
||
</header>
|
||
|
||
<div class="layout">
|
||
<div class="canvas-wrapper" id="stage-col">
|
||
<div class="stage">
|
||
|
||
<!-- 1. intro -->
|
||
<div class="stage-view" data-view="intro">
|
||
<div class="bg-grid"></div>
|
||
<div class="intro-block">
|
||
<div class="intro-eyebrow">cis490 · live fleet telemetry</div>
|
||
<div class="intro-title">behavioral<br>malware<br>detection</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 2. collect — big-number hook right after the title -->
|
||
<div class="stage-view" data-view="collect">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">episodes ingested</div>
|
||
<div class="metric-big" id="ingest-total">0</div>
|
||
<div class="metric-sub">
|
||
<span id="ingest-rate">0.0</span> / sec · last 60 s ·
|
||
total bytes on disk: <span id="ingest-bytes">0 B</span>
|
||
</div>
|
||
<svg class="sparkline" id="ingest-spark" viewBox="0 0 600 120" preserveAspectRatio="none">
|
||
<path id="ingest-spark-fill" d=""></path>
|
||
<path id="ingest-spark-path" d=""></path>
|
||
</svg>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 3. motivation — what detection unlocks -->
|
||
<div class="stage-view" data-view="motivation">
|
||
<div class="metric-stack metric-stack-wide motivation-stack">
|
||
<div class="metric-eyebrow">what detection unlocks</div>
|
||
<div class="motivation-cards">
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-trust"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">network-level trust scoring</div>
|
||
<div class="motivation-card-text">A noisy on-device classifier becomes
|
||
useful when its verdict feeds a fleet-wide trust score —
|
||
peers, gateways, and traffic patterns vote together. A
|
||
single host's signal is fragile; combined network
|
||
behaviour is much harder to spoof.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-contain"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">containment before pivot</div>
|
||
<div class="motivation-card-text">"Infected" is actionable: quarantine
|
||
the device's credentials, drop its traffic at the
|
||
gateway, stop lateral movement before the attacker
|
||
pivots to a neighbor. Detection latency directly
|
||
bounds blast radius.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-recover"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">fast post-attack reset</div>
|
||
<div class="motivation-card-text">With a known infection time you can
|
||
roll a device back to a snapshot taken before the
|
||
compromise — no forensic dwell time, no guessing how
|
||
far back to roll. Recovery becomes a one-button
|
||
operation instead of a week of cleanup.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- stack — Python stack & libraries used in the project -->
|
||
<div class="stage-view" data-view="stack">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">the stack behind the live data on the right</div>
|
||
<div class="code-grid">
|
||
<div class="code-card">
|
||
<div class="code-card-header">pyproject.toml</div>
|
||
<pre class="code" id="code-pyproject"></pre>
|
||
</div>
|
||
<div class="code-card">
|
||
<div class="code-card-header">receiver/app.py · file header</div>
|
||
<pre class="code" id="code-receiver"></pre>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 4. hosts -->
|
||
<div class="stage-view" data-view="hosts">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">per-host shipping</div>
|
||
<div class="bars" id="host-bars">
|
||
<div class="awaiting">awaiting snapshot…</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 5. db — episode database explorer -->
|
||
<div class="stage-view" data-view="db">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="db-header">
|
||
<div class="metric-eyebrow">episode database · last 200 records</div>
|
||
<div class="db-count" id="db-count">0 of 0</div>
|
||
</div>
|
||
<div class="db-controls">
|
||
<div class="db-tabs" id="db-tabs"></div>
|
||
<input class="db-search" id="db-search" type="text"
|
||
placeholder="filter by host / id / sha…" />
|
||
</div>
|
||
<div class="db-table-wrap">
|
||
<table class="db-table">
|
||
<thead>
|
||
<tr>
|
||
<th>host</th>
|
||
<th>episode_id</th>
|
||
<th>received</th>
|
||
<th>size</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody id="db-tbody"></tbody>
|
||
</table>
|
||
</div>
|
||
<div class="db-detail" id="db-detail" hidden>
|
||
<div class="db-detail-meta" id="db-detail-meta"></div>
|
||
<div class="db-detail-chart-wrap">
|
||
<svg class="db-detail-chart" id="db-detail-chart"
|
||
viewBox="0 0 1000 360" preserveAspectRatio="none"></svg>
|
||
</div>
|
||
<div class="db-detail-legend" id="db-detail-legend"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 6. baseline -->
|
||
<div class="stage-view" data-view="baseline">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow" id="phase-mix-eyebrow">phase mix · sampling dataset…</div>
|
||
<div class="phase-stack" id="phase-stack"></div>
|
||
<div class="phase-legend" id="phase-legend"></div>
|
||
<div class="metric-sub" id="phase-mix-sub">computing the phase
|
||
distribution across a random sample of episodes on disk.
|
||
A clean fleet sits mostly in <code>clean</code>; skew toward
|
||
<code>infecting</code> / <code>infected_running</code>
|
||
reflects time spent under attack workloads.</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 7. attacks -->
|
||
<div class="stage-view" data-view="attacks">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">attack envelopes · /proc signature per profile</div>
|
||
<div class="profile-grid" id="profile-grid"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 8. chunking -->
|
||
<div class="stage-view" data-view="chunking">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">10-second windows · model input shape</div>
|
||
<div class="chunk-rule" id="chunk-rule"></div>
|
||
<div class="chunk-row" id="chunk-row"></div>
|
||
<div class="chunk-axis" id="chunk-axis"></div>
|
||
<div class="metric-sub">each window: 100 samples (10 Hz × 10 s),
|
||
labeled by the phase that occupies its center.</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- training-code — how we trained, before showing results -->
|
||
<div class="stage-view" data-view="training-code">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">how we trained the sequence models</div>
|
||
<div class="code-grid">
|
||
<div class="code-card">
|
||
<div class="code-card-header">training/models/lstm.py</div>
|
||
<pre class="code" id="code-train-lstm"></pre>
|
||
</div>
|
||
<div class="code-card">
|
||
<div class="code-card-header">training/trainer/_loop.py · train_nn</div>
|
||
<pre class="code" id="code-train-loop"></pre>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- models — accuracy bars (results after training-code) -->
|
||
<div class="stage-view" data-view="models">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">sequence models · accuracy on held-out samples</div>
|
||
<div class="model-bars" id="model-bars"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 11. knn — interactive 3-D scatter with mode toggle -->
|
||
<div class="stage-view" data-view="knn">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">window features · 3-D projection · drag to rotate</div>
|
||
<div class="scatter3d-controls">
|
||
<div class="scatter3d-modes">
|
||
<button class="scatter3d-mode active" data-mode="phase">phase (ground truth)</button>
|
||
<button class="scatter3d-mode" data-mode="predicted">KNN-predicted label</button>
|
||
<button class="scatter3d-mode" data-mode="cluster">cluster id</button>
|
||
</div>
|
||
<button class="scatter3d-reset">reset view</button>
|
||
</div>
|
||
<div class="scatter3d-wrap">
|
||
<canvas class="scatter3d" id="knn-scatter-canvas"></canvas>
|
||
</div>
|
||
<div class="phase-legend" id="knn-legend"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 13. references — PDF viewer with tabs + description -->
|
||
<div class="stage-view" data-view="references">
|
||
<div class="metric-stack metric-stack-wide ref-stack">
|
||
<div class="metric-eyebrow">references · papers, notes, prior work</div>
|
||
<div class="ref-tabs" id="ref-tabs"></div>
|
||
<div class="ref-content">
|
||
<div class="ref-viewer-wrap">
|
||
<iframe class="ref-viewer" id="ref-viewer"
|
||
title="reference viewer"
|
||
sandbox="allow-same-origin allow-scripts allow-popups allow-forms"></iframe>
|
||
</div>
|
||
<div class="ref-description" id="ref-description"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 12. perf -->
|
||
<div class="stage-view" data-view="perf">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">accuracy vs inference cost</div>
|
||
<svg class="scatter" id="perf-scatter" viewBox="0 0 600 360" preserveAspectRatio="xMidYMid meet"></svg>
|
||
<div class="metric-sub">x: μs / window (lower is better) ·
|
||
y: held-out accuracy (higher is better).</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 14. live — fleet-wide live detections feed -->
|
||
<div class="stage-view" data-view="live">
|
||
<div class="metric-stack metric-stack-wide live-stack">
|
||
<div class="live-stats">
|
||
<span class="live-stats-eye">A100 inference · live</span>
|
||
<span class="live-stats-dot" id="live-stats-hosts">0 models</span>
|
||
<span class="live-stats-dot" id="live-stats-rate">0 infer / sec</span>
|
||
<span class="live-stats-dot" id="live-stats-model">last window: —</span>
|
||
<span class="live-stats-dot" id="live-stats-acc">hit-rate: —</span>
|
||
</div>
|
||
<div class="live-lanes" id="live-lanes"></div>
|
||
<div class="live-latest" id="live-latest">
|
||
<div class="live-latest-empty">awaiting <code>live_detection</code> events from the A100 inference loop</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
<button id="next-fab" class="fab" data-no-advance title="Next (→)">▼</button>
|
||
</div>
|
||
|
||
<article class="article">
|
||
|
||
<section class="scene" data-stage="intro">
|
||
<div class="prose">
|
||
<p class="lede">Most malware doesn't look like malware in a database
|
||
— it looks like a process behaving badly.</p>
|
||
<p>An <strong>intrusion detection system</strong> spots the bad
|
||
behavior; an <strong>intrusion prevention system</strong> stops it.
|
||
Both depend on knowing what bad behavior <em>looks like</em> at the
|
||
level of telemetry the device can actually see.</p>
|
||
<p>This deck is the live face of the dataset we're building to teach
|
||
a model that distinction — every panel on the left is a slice of
|
||
real data shipping in right now.</p>
|
||
<p class="hint">scroll, click, or → to advance</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="collect">
|
||
<div class="prose">
|
||
<h2>Collecting the dataset</h2>
|
||
<p>Each lab host on the WireGuard mesh boots a real Alpine VM, runs
|
||
a profile-driven workload inside it, and samples
|
||
<code>/proc/<qemu_pid></code> at 10 Hz. Every ~30 seconds
|
||
the labeled tarball is shipped to this Pi over mTLS.</p>
|
||
<p>The counter on the left is the running total, sourced from the
|
||
receiver's <code>index.jsonl</code> on disk. The sparkline is the
|
||
arrival rate over the last sixty seconds — proof that the deck
|
||
is reading live data, not a fixed slide.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="motivation">
|
||
<div class="prose">
|
||
<h2>Why detect at all?</h2>
|
||
<p>Knowing a device is compromised is the precondition for everything
|
||
else. A classifier that says "this host is infected right now"
|
||
turns into three concrete operational capabilities — and each
|
||
one rewards a faster, more confident detector.</p>
|
||
<p><strong>Trust scoring across the network.</strong> Recent work
|
||
on per-device trust establishment
|
||
(<a href="https://ieeexplore.ieee.org/document/9881803"
|
||
target="_blank" rel="noopener">IEEE 9881803</a>) argues that
|
||
on-device metrics alone aren't enough — a fleet has to combine
|
||
local classifier verdicts with network-behaviour signals
|
||
(peer observations, gateway traffic patterns, inter-host
|
||
relationships) to score trust reliably. Our per-host detector
|
||
is one input to that broader signal.</p>
|
||
<p><strong>Containment.</strong> Once a host is flagged, the
|
||
gateway can drop its traffic and the IAM layer can revoke
|
||
credentials before lateral movement begins. Detection
|
||
latency translates directly into how much of the network
|
||
an attacker reaches.</p>
|
||
<p><strong>Quick recovery.</strong> A confirmed infection time
|
||
lets you restore from a snapshot taken just before the
|
||
compromise — no forensic dwell time, no guessing how far
|
||
back to roll. The recovery path becomes a one-button operation
|
||
instead of a week of cleanup.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="stack">
|
||
<div class="prose">
|
||
<h2>Live, not staged</h2>
|
||
<p>Every panel from here on is real data from real devices —
|
||
counters, bars, the episode database, all driven by the
|
||
<code>cis490-receiver</code> service running on this Pi as
|
||
you scroll.</p>
|
||
<p>The code on the left is how it gets here. Four runtime deps:
|
||
<strong>starlette</strong> + <strong>uvicorn</strong> for the
|
||
async HTTP and WebSocket surface, <strong>msgpack</strong>
|
||
talks to Metasploit's RPC, <strong>pycdlib</strong> builds the
|
||
lab-VM cidata ISOs. Everything else is the standard library,
|
||
and every dep is annotated with a one-line reason it's there.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="hosts">
|
||
<div class="prose">
|
||
<h2>A multi-host fleet</h2>
|
||
<p>Running the same orchestrator on multiple hosts gives novel,
|
||
non-overlapping data per host — no central coordinator. Each host
|
||
pulls a different slice of the manifest, so the dataset grows in
|
||
parallel.</p>
|
||
<p>The numbers below are absolute episode counts on disk, refreshed
|
||
from <code>/var/lib/cis490/episodes/<host>/</code> every
|
||
thirty seconds.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="db">
|
||
<div class="prose">
|
||
<h2>The dataset, browsable</h2>
|
||
<p>Every row is one labeled episode tarball stored at
|
||
<code>/var/lib/cis490/episodes/<host>/<id>.tar.zst</code>
|
||
after the receiver verifies its SHA-256 and writes it through.</p>
|
||
<p>Filter by host with the tabs, or grep by host / episode id /
|
||
sha with the search box. Click a row for the full
|
||
<code>index.jsonl</code> record. The view holds the most recent
|
||
two hundred records — older history is on disk, indexable
|
||
from the receiver.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="baseline">
|
||
<div class="prose">
|
||
<h2>A baseline of normal</h2>
|
||
<p>Before we can detect a deviation, we have to know what the fleet
|
||
looks like across a wide slice of its life. The stacked bar
|
||
aggregates ground-truth phase labels across hundreds of randomly
|
||
sampled episodes from the dataset on disk — weighted by the time
|
||
the workload actually spent in each phase, not just the count of
|
||
transitions.</p>
|
||
<p>If the model only ever sees <code>clean</code>, it overfits to
|
||
"everything is fine." The phase schedule fixes that by forcing
|
||
every run to walk through every phase, which is why
|
||
<code>infected_running</code> dominates the mix — that's where
|
||
the labelled attack workload sits.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="attacks">
|
||
<div class="prose">
|
||
<h2>Linking attack to telemetry</h2>
|
||
<p>The same six profiles run across every host, and each one
|
||
produces a different envelope in <code>/proc</code>. A
|
||
cryptominer pegs one core for minutes. A bursty C2 channel sits
|
||
idle, then exhales three packets. Ransomware walks the
|
||
filesystem and saturates I/O.</p>
|
||
<p>The thumbnails on the left are the canonical envelopes the
|
||
model has to learn to recognize — same axes, different shapes.
|
||
That shape difference is what makes detection tractable.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="chunking">
|
||
<div class="prose">
|
||
<h2>Ten-second windows</h2>
|
||
<p>Models eat fixed-size inputs. We chop each episode into
|
||
10-second windows — 100 samples per window at 10 Hz — and
|
||
label each window with the phase that occupies its center.</p>
|
||
<p>Window size is a knob. Too short and the model can't see slow
|
||
envelopes (low-and-slow malware, idle C2). Too long and you can't
|
||
react fast enough to be a useful prevention signal. Ten seconds
|
||
is the starting point we tune around.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="training-code">
|
||
<div class="prose">
|
||
<h2>How we trained them</h2>
|
||
<p>One trainer per model — load the windowed dataset, define the
|
||
network, train, evaluate. Same shape for RNN, GRU, LSTM, BERT,
|
||
so you can read all four side-by-side and the only differences
|
||
are the architecture itself.</p>
|
||
<p>The code on the left is the LSTM trainer.
|
||
PyTorch's <code>DataLoader</code> handles windowing,
|
||
<code>nn.LSTM</code> is one line, the loop is six.
|
||
No custom loss, no rate schedule, no manual batching —
|
||
anything fancier has to earn its place by beating the simple
|
||
version on held-out samples.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="models">
|
||
<div class="prose">
|
||
<h2>Sequence models</h2>
|
||
<p><strong>RNN, GRU, LSTM</strong> — recurrent models that read the
|
||
window one timestep at a time and carry state forward. Cheap,
|
||
mature, easy to interpret.</p>
|
||
<p><strong>BERT-style transformer</strong> — the window becomes a
|
||
sequence of "tokens"; attention captures cross-position context
|
||
instead of accumulating it through a hidden state. More
|
||
parameters, more compute, more room to overfit a small dataset.</p>
|
||
<p>Same input, same labels, four different inductive biases. The
|
||
comparison on the left is the punchline of the whole project.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="knn">
|
||
<div class="prose">
|
||
<h2>Nearest-neighbor as a sanity check</h2>
|
||
<p>Before anything fancy: engineer summary features per window
|
||
(mean, std, p95, slope, zero-bucket counts per channel) and run
|
||
<strong>KNN</strong> in that feature space.</p>
|
||
<p>If the phase clusters separate visibly in two dimensions, KNN
|
||
already does most of the work and a deep model is only buying
|
||
marginal improvement. If they don't separate, you've learned
|
||
something about the feature engineering before training a single
|
||
epoch.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="perf">
|
||
<div class="prose">
|
||
<h2>Accuracy vs complexity</h2>
|
||
<p>Bigger models earn better numbers in the validation set — but
|
||
they also need more parameters, more inference time, and more
|
||
memory at the edge. The deployed model has to fit on the device
|
||
it's protecting.</p>
|
||
<p>The scatter on the left is the usable trade-off curve: every
|
||
point above and to the left of where you currently sit is a
|
||
reachable upgrade. The point in the bottom-right is a model
|
||
you'd never ship.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="live">
|
||
<div class="prose">
|
||
<h2>Catching attacks live</h2>
|
||
<p>The <strong>A100</strong> runs inference against incoming
|
||
ten-second windows from the fleet. Each row on the stage is
|
||
<em>one trained model</em> doing live prediction; each cell
|
||
is its phase verdict on a freshly-arrived window, painted
|
||
by the predicted phase.</p>
|
||
<p>Read the lanes side-by-side as a model-agreement check:
|
||
when the recurrent family (RNN / GRU / LSTM) all flip to
|
||
<code>infecting</code> at the same time, that's strong
|
||
evidence the host actually is. When ground truth from
|
||
<code>labels.jsonl</code> catches up, mismatched cells get
|
||
a hatched overlay and the running hit-rate ticks. The
|
||
callout below holds the most recent prediction with model
|
||
name, A100 round-trip latency, and confidence.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="references">
|
||
<div class="prose">
|
||
<h2>References</h2>
|
||
<p>The papers, notes, and prior work this project leans on.
|
||
Pick a tab on the left to load the document; the viewer
|
||
takes the bulk of the stage so you can scroll through
|
||
without leaving the deck.</p>
|
||
<p class="hint">end of deck · ← to flip back</p>
|
||
</div>
|
||
</section>
|
||
|
||
<div class="scene-end-spacer"></div>
|
||
</article>
|
||
</div>
|
||
|
||
<script src="/static/dashboard.js?v=d16a0e1c"></script>
|
||
</body>
|
||
</html>
|