1318 lines
67 KiB
HTML
1318 lines
67 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
<title>CIS490 — live</title>
|
||
<link rel="stylesheet" href="/static/dashboard.css?v=d4067342">
|
||
</head>
|
||
<body>
|
||
<!-- SVG filter defs for the lava-lamp goo effect. Width/height 0
|
||
so it doesn't take layout space; the filter is referenced by
|
||
CSS via filter: url(#goo). -->
|
||
<svg class="goo-defs" width="0" height="0" aria-hidden="true">
|
||
<defs>
|
||
<filter id="goo">
|
||
<feGaussianBlur in="SourceGraphic" stdDeviation="22" result="blur"/>
|
||
<feColorMatrix in="blur" mode="matrix" result="goo" values="
|
||
1 0 0 0 0
|
||
0 1 0 0 0
|
||
0 0 1 0 0
|
||
0 0 0 26 -12"/>
|
||
<feBlend in="SourceGraphic" in2="goo"/>
|
||
</filter>
|
||
</defs>
|
||
</svg>
|
||
|
||
<!-- Theme background layers — exactly one is visible at a time,
|
||
selected by body[data-theme]. The blobs / bubbles / beams
|
||
inside drift / lava / laser are generated by JS so the count
|
||
and statistical-distribution sliders actually take effect. -->
|
||
<div class="bg-canvas" id="bg-canvas" aria-hidden="true">
|
||
<div class="bg-tint"></div>
|
||
|
||
<div class="bg-drift" id="bg-drift"></div>
|
||
|
||
<div class="bg-lava">
|
||
<div class="goo-container" id="bg-lava-bubbles"></div>
|
||
</div>
|
||
|
||
<div class="bg-vaporwave">
|
||
<div class="vw-sky"></div>
|
||
<!-- Scanlines BEFORE sun: the sun's solid disc occludes
|
||
scanlines inside its area so they can't beat against the
|
||
sun's venetian-blind stripes (the same kind of moiré
|
||
that previously appeared between scanlines and the
|
||
perspective floor — same shape, smaller scale). -->
|
||
<div class="vw-scanlines"></div>
|
||
<div class="vw-sun"><div class="vw-sun-blinds"></div></div>
|
||
<div class="vw-horizon"></div>
|
||
<div class="vw-floor"><div class="vw-floor-grid"></div></div>
|
||
</div>
|
||
|
||
<div class="bg-laser" id="bg-laser-beams"></div>
|
||
</div>
|
||
|
||
<!-- Right-half sidebar theme panel. Slides in/out via the
|
||
`is-open` class — we don't use the `hidden` attribute because
|
||
the transform animation needs the panel to stay rendered. -->
|
||
<div class="theme-panel" id="theme-panel">
|
||
<div class="theme-panel-header">
|
||
<span class="theme-title">theme · OKLCH</span>
|
||
<button id="theme-close" class="ghost icon" title="Close (t)">×</button>
|
||
</div>
|
||
|
||
<label class="theme-row">
|
||
<span>background</span>
|
||
<select id="theme-bg">
|
||
<option value="black">black (still)</option>
|
||
<option value="drift">drift (soft blobs)</option>
|
||
<option value="lava">lava lamp (goo metaballs)</option>
|
||
<option value="vaporwave">vaporwave</option>
|
||
<option value="laser">laser show</option>
|
||
</select>
|
||
</label>
|
||
|
||
<div class="theme-wheel-block">
|
||
<div class="theme-wheel" id="theme-wheel">
|
||
<div class="wheel-disc"></div>
|
||
<div class="wheel-rim"></div>
|
||
<div class="wheel-markers" id="wheel-markers"></div>
|
||
</div>
|
||
<div class="theme-sliders">
|
||
<label>L · lightness · <span id="theme-l-val">70</span>%
|
||
<input type="range" id="theme-l" min="20" max="95" value="70" step="1"></label>
|
||
<label>C · chroma · <span id="theme-c-val">0.15</span>
|
||
<input type="range" id="theme-c" min="0" max="0.4" value="0.15" step="0.005"></label>
|
||
<label>H · hue · <span id="theme-h-val">250</span>°
|
||
<input type="range" id="theme-h" min="0" max="360" value="250" step="1"></label>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="theme-sliders theme-harmony-block">
|
||
<label>colors · count · <span id="theme-count-val">3</span>
|
||
<input type="range" id="theme-count" min="1" max="6" value="3" step="1"></label>
|
||
<label>spread · angular range · <span id="theme-spread-val">60</span>°
|
||
<input type="range" id="theme-spread" min="0" max="300" value="60" step="1"></label>
|
||
<div class="theme-harmony-hint" id="theme-harmony-hint"></div>
|
||
</div>
|
||
|
||
<details class="theme-advanced">
|
||
<summary>advanced — palette ladder</summary>
|
||
<div class="theme-sliders">
|
||
<label>L variance · per-color lightness ladder · <span id="theme-lvar-val">0</span>
|
||
<input type="range" id="theme-lvar" min="0" max="40" value="0" step="1"></label>
|
||
<label>C variance · per-color chroma ladder · <span id="theme-cvar-val">0.00</span>
|
||
<input type="range" id="theme-cvar" min="0" max="0.15" value="0" step="0.005"></label>
|
||
</div>
|
||
</details>
|
||
|
||
<div class="theme-row">
|
||
<span>palette</span>
|
||
<div class="theme-swatches" id="theme-swatches"></div>
|
||
</div>
|
||
|
||
<details class="theme-advanced" open>
|
||
<summary>animation · global</summary>
|
||
<div class="theme-sliders">
|
||
<label>speed · <span id="theme-speed-val">1.00</span>×
|
||
<input type="range" id="theme-speed" min="0.1" max="4" value="1" step="0.05"></label>
|
||
<label>blur · <span id="theme-blur-val">0</span> px
|
||
<input type="range" id="theme-blur" min="0" max="40" value="0" step="1"></label>
|
||
<label>tint strength · <span id="theme-tint-val">0.10</span>
|
||
<input type="range" id="theme-tint" min="0" max="0.6" value="0.1" step="0.02"></label>
|
||
<label>content backdrop · <span id="theme-backdrop-val">0.30</span>
|
||
<input type="range" id="theme-backdrop" min="0" max="1" value="0.3" step="0.05"></label>
|
||
</div>
|
||
</details>
|
||
|
||
<!-- Per-theme settings — dynamically built by JS from the THEMES
|
||
spec; only the section matching state.background is shown. -->
|
||
<div id="theme-bg-settings"></div>
|
||
|
||
<div class="theme-meta-row">
|
||
<code id="theme-meta">oklch(70% 0.15 250)</code>
|
||
<button id="theme-reset" class="ghost">reset</button>
|
||
</div>
|
||
</div>
|
||
|
||
<header class="topbar">
|
||
<span class="brand">CIS490</span>
|
||
<span id="status" class="status">connecting…</span>
|
||
<span class="spacer"></span>
|
||
<span class="counter"><span id="scene-idx">1</span> / <span id="scene-total">1</span></span>
|
||
<button id="prev-btn" class="ghost icon" title="Previous (← / k)">◀</button>
|
||
<button id="next-btn" class="ghost icon" title="Next (→ / space / j)">▶</button>
|
||
<button id="click-nav-btn" class="ghost" title="Click on the stage to advance to the next slide (c)">click-nav: off</button>
|
||
<button id="demo-btn" class="ghost" title="Toggle local synthetic data (d)">demo: off</button>
|
||
<button id="theme-btn" class="ghost" title="Theme panel (t)">theme</button>
|
||
</header>
|
||
|
||
<div class="layout">
|
||
<div class="canvas-wrapper" id="stage-col">
|
||
<div class="stage">
|
||
|
||
<!-- 1. intro -->
|
||
<div class="stage-view" data-view="intro">
|
||
<div class="bg-grid"></div>
|
||
<div class="intro-block">
|
||
<div class="intro-eyebrow">cis490 · live fleet telemetry</div>
|
||
<div class="intro-title">behavioral<br>malware<br>detection</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 2. collect — big-number hook right after the title -->
|
||
<div class="stage-view" data-view="collect">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">episodes ingested</div>
|
||
<div class="metric-big" id="ingest-total">0</div>
|
||
<div class="metric-sub">
|
||
<span id="ingest-rate">0.0</span> / sec · last 60 s ·
|
||
total bytes on disk: <span id="ingest-bytes">0 B</span>
|
||
</div>
|
||
<svg class="sparkline" id="ingest-spark" viewBox="0 0 600 120" preserveAspectRatio="none">
|
||
<path id="ingest-spark-fill" d=""></path>
|
||
<path id="ingest-spark-path" d=""></path>
|
||
</svg>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 3. motivation — what detection unlocks -->
|
||
<div class="stage-view" data-view="motivation">
|
||
<div class="metric-stack metric-stack-wide motivation-stack">
|
||
<div class="metric-eyebrow">what detection unlocks</div>
|
||
<div class="motivation-cards">
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-trust"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">network-level trust scoring</div>
|
||
<div class="motivation-card-text">A noisy on-device classifier becomes
|
||
useful when its verdict feeds a fleet-wide trust score —
|
||
peers, gateways, and traffic patterns vote together. A
|
||
single host's signal is fragile; combined network
|
||
behaviour is much harder to spoof.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-contain"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">containment before pivot</div>
|
||
<div class="motivation-card-text">"Infected" is actionable: quarantine
|
||
the device's credentials, drop its traffic at the
|
||
gateway, stop lateral movement before the attacker
|
||
pivots to a neighbor. Detection latency directly
|
||
bounds blast radius.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-recover"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">fast post-attack reset</div>
|
||
<div class="motivation-card-text">With a known infection time you can
|
||
roll a device back to a snapshot taken before the
|
||
compromise — no forensic dwell time, no guessing how
|
||
far back to roll. Recovery becomes a one-button
|
||
operation instead of a week of cleanup.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 3. problem-statement — what we're solving + task type -->
|
||
<div class="stage-view" data-view="problem-statement">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">the problem · single sentence + numbers</div>
|
||
<div class="problem-claim">
|
||
<div class="problem-claim-text">Classify each ten-second window of fleet
|
||
<code>/proc</code> telemetry into one of five workload phases —
|
||
accurately enough to drive automated containment.</div>
|
||
</div>
|
||
<div class="problem-stats">
|
||
<div class="problem-stat">
|
||
<div class="problem-stat-num">5</div>
|
||
<div class="problem-stat-lbl">phase classes<br><code>clean</code> → <code>infected_running</code></div>
|
||
</div>
|
||
<div class="problem-stat">
|
||
<div class="problem-stat-num">12</div>
|
||
<div class="problem-stat-lbl"><code>/proc</code> channels<br>no syscalls, no kernel hooks</div>
|
||
</div>
|
||
<div class="problem-stat">
|
||
<div class="problem-stat-num">10s</div>
|
||
<div class="problem-stat-lbl">classification window<br>100 samples × 12 channels</div>
|
||
</div>
|
||
</div>
|
||
<div class="problem-task">
|
||
<span class="problem-task-label">task type:</span>
|
||
<span class="problem-task-value">multi-class classification</span>
|
||
<span class="problem-task-detail">— five mutually-exclusive
|
||
phase labels, balanced via class-weighted cross-entropy.
|
||
Not regression (no continuous target), not ranking
|
||
(downstream policy is a categorical containment decision).</span>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 4. research-questions — literature gaps and questions -->
|
||
<div class="stage-view" data-view="research-questions">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">literature gaps · positioning the work</div>
|
||
<div class="research-grid">
|
||
<div class="research-col">
|
||
<div class="research-col-title">what prior work covers</div>
|
||
<ul class="research-list">
|
||
<li><strong>LSTM on syscall traces</strong> in VMs —
|
||
deeper telemetry than <code>/proc</code></li>
|
||
<li><strong>Transformer on per-process resource metrics</strong>
|
||
— related signal, single-host eval</li>
|
||
<li><strong>BERT on system logs</strong> (LogBERT) —
|
||
text-form telemetry, not numeric channels</li>
|
||
<li><strong>Insider-threat LSTM on event logs</strong>
|
||
(DANTE) — categorical events, not continuous</li>
|
||
<li><strong>Network-behaviour trust establishment</strong>
|
||
(IEEE 9881803) — cross-device aggregation,
|
||
not per-host classifier</li>
|
||
</ul>
|
||
</div>
|
||
<div class="research-col">
|
||
<div class="research-col-title">what's missing</div>
|
||
<ul class="research-list">
|
||
<li><strong>/proc-only signal</strong> — most work
|
||
assumes syscalls or kernel hooks</li>
|
||
<li><strong>Sample-stratified evaluation</strong> —
|
||
papers often hide same-sample overfit by training
|
||
and testing on the same malware instances</li>
|
||
<li><strong>Real-time per-window classification</strong>
|
||
for containment, not post-hoc batch labelling</li>
|
||
<li><strong>Side-by-side cell-choice comparison</strong>
|
||
(RNN/GRU/LSTM/CNN/Transformer) on one dataset</li>
|
||
<li><strong>Direct integration</strong> with a
|
||
fleet-wide trust score, not standalone output</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 5. solution-overview — pipeline block diagram -->
|
||
<div class="stage-view" data-view="solution-overview">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">pipeline · what each stage produces</div>
|
||
<svg class="pipeline-svg" viewBox="0 0 800 480"
|
||
xmlns="http://www.w3.org/2000/svg"
|
||
preserveAspectRatio="xMidYMid meet">
|
||
<g class="pipeline-stage">
|
||
<rect x="20" y="40" width="140" height="60" rx="4"/>
|
||
<text x="90" y="68" text-anchor="middle">fleet hosts</text>
|
||
<text x="90" y="86" text-anchor="middle" class="pipeline-detail">/proc · 10 Hz</text>
|
||
</g>
|
||
<g class="pipeline-stage">
|
||
<rect x="200" y="40" width="140" height="60" rx="4"/>
|
||
<text x="270" y="68" text-anchor="middle">receiver (Pi)</text>
|
||
<text x="270" y="86" text-anchor="middle" class="pipeline-detail">bearer auth</text>
|
||
</g>
|
||
<g class="pipeline-stage">
|
||
<rect x="380" y="40" width="140" height="60" rx="4"/>
|
||
<text x="450" y="68" text-anchor="middle">episode store</text>
|
||
<text x="450" y="86" text-anchor="middle" class="pipeline-detail">zstd · tar</text>
|
||
</g>
|
||
<g class="pipeline-stage">
|
||
<rect x="560" y="40" width="220" height="60" rx="4"/>
|
||
<text x="670" y="68" text-anchor="middle">windowing + features</text>
|
||
<text x="670" y="86" text-anchor="middle" class="pipeline-detail">10 s · 100 samples × 12 ch</text>
|
||
</g>
|
||
<g class="pipeline-stage pipeline-stage-models">
|
||
<rect x="180" y="170" width="440" height="120" rx="4"/>
|
||
<text x="400" y="198" text-anchor="middle" class="pipeline-stage-title">model zoo</text>
|
||
<text x="400" y="226" text-anchor="middle" class="pipeline-detail">KNN · GBT · MLP · CNN · RNN · GRU · LSTM · Transformer</text>
|
||
<text x="400" y="252" text-anchor="middle" class="pipeline-detail">trained per (model × split-recipe)</text>
|
||
<text x="400" y="276" text-anchor="middle" class="pipeline-detail-mini">held-out-by-sample · class-weighted CE · early stop on val macro-F1</text>
|
||
</g>
|
||
<g class="pipeline-stage">
|
||
<rect x="60" y="350" width="200" height="60" rx="4"/>
|
||
<text x="160" y="378" text-anchor="middle">per-window phase</text>
|
||
<text x="160" y="396" text-anchor="middle" class="pipeline-detail">5-class softmax</text>
|
||
</g>
|
||
<g class="pipeline-stage pipeline-stage-final">
|
||
<rect x="300" y="350" width="200" height="60" rx="4"/>
|
||
<text x="400" y="378" text-anchor="middle">trust score</text>
|
||
<text x="400" y="396" text-anchor="middle" class="pipeline-detail">+ network signals (9881803)</text>
|
||
</g>
|
||
<g class="pipeline-stage pipeline-stage-final">
|
||
<rect x="540" y="350" width="220" height="60" rx="4"/>
|
||
<text x="650" y="378" text-anchor="middle">containment + reset</text>
|
||
<text x="650" y="396" text-anchor="middle" class="pipeline-detail">snapshot rollback</text>
|
||
</g>
|
||
<g class="pipeline-arrow" fill="none">
|
||
<path d="M160 70 L200 70" />
|
||
<path d="M340 70 L380 70" />
|
||
<path d="M520 70 L560 70" />
|
||
<path d="M670 100 L670 130 L400 130 L400 170" />
|
||
<path d="M400 290 L400 320 L160 320 L160 350" />
|
||
<path d="M260 380 L300 380" />
|
||
<path d="M500 380 L540 380" />
|
||
</g>
|
||
</svg>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 6. stack — Python stack & libraries used in the project -->
|
||
<div class="stage-view" data-view="stack">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">the stack behind the live data on the right</div>
|
||
<div class="code-grid">
|
||
<div class="code-card">
|
||
<div class="code-card-header">pyproject.toml</div>
|
||
<pre class="code" id="code-pyproject"></pre>
|
||
</div>
|
||
<div class="code-card">
|
||
<div class="code-card-header">receiver/app.py · file header</div>
|
||
<pre class="code" id="code-receiver"></pre>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 4. hosts -->
|
||
<div class="stage-view" data-view="hosts">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">per-host shipping</div>
|
||
<div class="bars" id="host-bars">
|
||
<div class="awaiting">awaiting snapshot…</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 5. db — episode database explorer -->
|
||
<div class="stage-view" data-view="db">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="db-header">
|
||
<div class="metric-eyebrow">episode database · last 200 records</div>
|
||
<div class="db-count" id="db-count">0 of 0</div>
|
||
</div>
|
||
<div class="db-controls">
|
||
<div class="db-tabs" id="db-tabs"></div>
|
||
<input class="db-search" id="db-search" type="text"
|
||
placeholder="filter by host / id / sha…" />
|
||
</div>
|
||
<div class="db-table-wrap">
|
||
<table class="db-table">
|
||
<thead>
|
||
<tr>
|
||
<th>host</th>
|
||
<th>episode_id</th>
|
||
<th>received</th>
|
||
<th>size</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody id="db-tbody"></tbody>
|
||
</table>
|
||
</div>
|
||
<div class="db-detail" id="db-detail" hidden>
|
||
<div class="db-detail-meta" id="db-detail-meta"></div>
|
||
<div class="db-detail-chart-wrap">
|
||
<svg class="db-detail-chart" id="db-detail-chart"
|
||
viewBox="0 0 1000 360" preserveAspectRatio="none"></svg>
|
||
</div>
|
||
<div class="db-detail-legend" id="db-detail-legend"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 6. baseline -->
|
||
<div class="stage-view" data-view="baseline">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow" id="phase-mix-eyebrow">phase mix · sampling dataset…</div>
|
||
<div class="phase-stack" id="phase-stack"></div>
|
||
<div class="phase-legend" id="phase-legend"></div>
|
||
<div class="metric-sub" id="phase-mix-sub">computing the phase
|
||
distribution across a random sample of episodes on disk.
|
||
A clean fleet sits mostly in <code>clean</code>; skew toward
|
||
<code>infecting</code> / <code>infected_running</code>
|
||
reflects time spent under attack workloads.</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 7. attacks -->
|
||
<div class="stage-view" data-view="attacks">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">attack envelopes · /proc signature per profile</div>
|
||
<div class="profile-grid" id="profile-grid"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 8. chunking -->
|
||
<div class="stage-view" data-view="chunking">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">10-second windows · model input shape</div>
|
||
<div class="chunk-rule" id="chunk-rule"></div>
|
||
<div class="chunk-row" id="chunk-row"></div>
|
||
<div class="chunk-axis" id="chunk-axis"></div>
|
||
<div class="metric-sub">each window: 100 samples (10 Hz × 10 s),
|
||
labeled by the phase that occupies its center.</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 9. evaluation-setup — splits, metrics, baselines -->
|
||
<div class="stage-view" data-view="evaluation-setup">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">evaluation setup · how the numbers get made</div>
|
||
<div class="eval-blocks">
|
||
<div class="eval-block">
|
||
<div class="eval-block-title">split recipe</div>
|
||
<div class="eval-block-body">
|
||
<div><strong>train / val / test:</strong> held-out by
|
||
<code>sample_name</code>, profile-stratified</div>
|
||
<div><strong>both hosts</strong> contribute to all three slices</div>
|
||
<div class="eval-detail">the fleet is uniform — every
|
||
host runs the same orchestrator and every profile —
|
||
so we don't split by host. We split by malware
|
||
<code>sample_name</code>: the specific instances in
|
||
the test set never appear during training.
|
||
Generalization axis is "unseen malware", not
|
||
"unseen device". Two profiles with only one sample
|
||
(cpu-saturate, low-and-slow) are excluded from
|
||
held-out-by-sample eval and reported separately.</div>
|
||
</div>
|
||
</div>
|
||
<div class="eval-block">
|
||
<div class="eval-block-title">primary metric</div>
|
||
<div class="eval-block-body">
|
||
<div><strong>macro-F1</strong> averaged across the five phases</div>
|
||
<div class="eval-detail">accuracy lies under class
|
||
imbalance — ~50 % <code>infected_running</code>,
|
||
~5 % <code>armed</code>. A constant majority predictor
|
||
hits 0.5 accuracy. macro-F1 averages per-class F1,
|
||
so rare phases actually count toward the score.</div>
|
||
</div>
|
||
</div>
|
||
<div class="eval-block">
|
||
<div class="eval-block-title">baselines compared</div>
|
||
<div class="eval-block-body">
|
||
<div><strong>KNN</strong> — non-parametric, instance-based</div>
|
||
<div><strong>GBT (XGBoost)</strong> — tabular non-NN</div>
|
||
<div><strong>MLP</strong> — feedforward ablation</div>
|
||
<div><strong>CNN</strong> — local-pattern ablation</div>
|
||
<div><strong>RNN / GRU / LSTM</strong> — recurrent family</div>
|
||
<div><strong>Transformer</strong> — attention</div>
|
||
</div>
|
||
</div>
|
||
<div class="eval-block">
|
||
<div class="eval-block-title">reported alongside accuracy</div>
|
||
<div class="eval-block-body">
|
||
<div><strong>μs / window</strong> — inference cost at batch=64</div>
|
||
<div><strong>val ↔ test gap</strong> — val − test macro-F1</div>
|
||
<div class="eval-detail">latency translates to
|
||
containment lag; the val ↔ test gap is the honest
|
||
measure of how much accuracy survives the move from
|
||
"samples we saw" to "samples we didn't". Both plot
|
||
on the perf scene.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- training-code — how we trained, before showing results -->
|
||
<div class="stage-view" data-view="training-code">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">how we trained the sequence models</div>
|
||
<div class="code-grid">
|
||
<div class="code-card">
|
||
<div class="code-card-header">training/models/lstm.py</div>
|
||
<pre class="code" id="code-train-lstm"></pre>
|
||
</div>
|
||
<div class="code-card">
|
||
<div class="code-card-header">training/trainer/_loop.py · train_nn</div>
|
||
<pre class="code" id="code-train-loop"></pre>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- models — accuracy bars (results after training-code) -->
|
||
<div class="stage-view" data-view="models">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">sequence models · accuracy on held-out samples</div>
|
||
<div class="model-bars" id="model-bars"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 11. knn — interactive 3-D scatter with mode toggle -->
|
||
<div class="stage-view" data-view="knn">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">window features · 3-D projection · drag to rotate</div>
|
||
<div class="scatter3d-controls">
|
||
<div class="scatter3d-modes">
|
||
<button class="scatter3d-mode active" data-mode="phase">phase (ground truth)</button>
|
||
<button class="scatter3d-mode" data-mode="predicted">KNN-predicted label</button>
|
||
<button class="scatter3d-mode" data-mode="cluster">cluster id</button>
|
||
</div>
|
||
<button class="scatter3d-reset">reset view</button>
|
||
</div>
|
||
<div class="scatter3d-wrap">
|
||
<canvas class="scatter3d" id="knn-scatter-canvas"></canvas>
|
||
</div>
|
||
<div class="phase-legend" id="knn-legend"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 13. references — PDF viewer with tabs + description -->
|
||
<div class="stage-view" data-view="references">
|
||
<div class="metric-stack metric-stack-wide ref-stack">
|
||
<div class="metric-eyebrow">references · papers, notes, prior work</div>
|
||
<div class="ref-tabs" id="ref-tabs"></div>
|
||
<div class="ref-content">
|
||
<div class="ref-viewer-wrap">
|
||
<iframe class="ref-viewer" id="ref-viewer"
|
||
title="reference viewer"
|
||
sandbox="allow-same-origin allow-scripts allow-popups allow-forms"></iframe>
|
||
</div>
|
||
<div class="ref-description" id="ref-description"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 12. perf -->
|
||
<div class="stage-view" data-view="perf">
|
||
<div class="metric-stack">
|
||
<div class="metric-eyebrow">accuracy vs inference cost</div>
|
||
<svg class="scatter" id="perf-scatter" viewBox="0 0 600 360" preserveAspectRatio="xMidYMid meet"></svg>
|
||
<div class="metric-sub">x: μs / window (lower is better) ·
|
||
y: held-out accuracy (higher is better).</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 14. live — fleet-wide live detections feed -->
|
||
<div class="stage-view" data-view="live">
|
||
<div class="metric-stack metric-stack-wide live-stack">
|
||
<div class="live-stats">
|
||
<span class="live-stats-eye">A100 inference · live</span>
|
||
<span class="live-stats-dot" id="live-stats-hosts">0 models</span>
|
||
<span class="live-stats-dot" id="live-stats-rate">0 infer / sec</span>
|
||
<span class="live-stats-dot" id="live-stats-model">last window: —</span>
|
||
<span class="live-stats-dot" id="live-stats-acc">hit-rate: —</span>
|
||
</div>
|
||
<div class="live-lanes" id="live-lanes"></div>
|
||
<div class="live-latest" id="live-latest">
|
||
<div class="live-latest-empty">awaiting <code>live_detection</code> events from the A100 inference loop</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 15. theoretical-contributions -->
|
||
<div class="stage-view" data-view="theoretical">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">theoretical contributions · what's new methodologically</div>
|
||
<div class="motivation-cards">
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-trust"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">window-centre labelling</div>
|
||
<div class="motivation-card-text">A 10-second
|
||
classification window is labelled by the phase that
|
||
occupies its centre, not by majority vote across the
|
||
window. Cleaner training signal at phase boundaries,
|
||
and avoids the spurious "ambiguous" class.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-contain"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">schema-hashed checkpoints</div>
|
||
<div class="motivation-card-text">Each checkpoint
|
||
embeds a hash of the feature schema; loading a model
|
||
against the wrong schema fails fast instead of
|
||
silently scoring on misaligned columns. Makes
|
||
retroactive comparison reproducible.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-recover"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">held-out-by-sample as the eval axis</div>
|
||
<div class="motivation-card-text">The hosts in the
|
||
fleet are uniform — same orchestrator, same workload,
|
||
different production rates. The generalization claim
|
||
is therefore "unseen malware sample", tested on the
|
||
same population of devices the training data came
|
||
from. Profile-stratified so every profile gets fair
|
||
train/val/test cells.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 16. practical-contributions -->
|
||
<div class="stage-view" data-view="practical">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">practical contributions · what others can use</div>
|
||
<div class="motivation-cards">
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-trust"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">/proc-only deployment</div>
|
||
<div class="motivation-card-text">No syscall hooks, no
|
||
eBPF, no kernel module — runs on hosts that don't
|
||
permit deep instrumentation. The detector is one
|
||
Python service plus a model file.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-contain"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">producer-agnostic dashboard</div>
|
||
<div class="motivation-card-text">The deck consumes
|
||
typed events; the inference loop runs anywhere
|
||
(Pi, A100, cloud) and just POSTs back. Same UI for
|
||
a lab demo and an operational console.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-recover"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">labelled dataset on disk</div>
|
||
<div class="motivation-card-text">78,000+ episodes,
|
||
five phases, two hosts, six attack profiles —
|
||
archived in zstd-compressed tarballs with a
|
||
schema-versioned format. Ready for downstream
|
||
work without re-running the orchestrator.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 17. design-principles -->
|
||
<div class="stage-view" data-view="design-principles">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">design principles · patterns that emerged</div>
|
||
<div class="motivation-cards">
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-trust"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">one loop, many models</div>
|
||
<div class="motivation-card-text">Every NN architecture
|
||
plugs into the same training loop — class weights,
|
||
AMP, cosine LR, early stop. Architecture changes
|
||
don't ripple into orchestration.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-contain"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">typed events as contract</div>
|
||
<div class="motivation-card-text">Producers and
|
||
consumers agree on dataclasses
|
||
(<code>events.py</code>), not free-form dicts.
|
||
Adding a new scene means adding a new dataclass;
|
||
adding a new producer means importing it.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-recover"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">two-agent path ownership</div>
|
||
<div class="motivation-card-text">Dashboard work and
|
||
model work live in two parallel sessions with a
|
||
documented path-ownership boundary. Merges go
|
||
through git with explicit rebases instead of a
|
||
shared workspace.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 18. limitations -->
|
||
<div class="stage-view" data-view="limitations">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">limitations · the honest list</div>
|
||
<div class="motivation-cards">
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-armed"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">two-host fleet</div>
|
||
<div class="motivation-card-text">Both hosts contribute
|
||
to train, val, and test, but the device population
|
||
is small (n = 2). Adding more hosts on the WireGuard
|
||
mesh wouldn't change the split recipe but would make
|
||
the dataset more representative of real-world
|
||
hardware variety.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-armed"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">synthetic attack profiles</div>
|
||
<div class="motivation-card-text">Six profiles cover the
|
||
main shapes (cpu-saturate, ransomware-lite, bursty-c2,
|
||
fork-bomb, crypto-miner, distccd-exec) but real-world
|
||
malware can sit between or outside these envelopes.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-armed"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">10 Hz sampling floor</div>
|
||
<div class="motivation-card-text">Sub-100ms attack
|
||
behaviours fall inside a single sample. Detection of
|
||
extremely short-lived attacks (millisecond-scale
|
||
privilege checks) requires faster sampling than
|
||
<code>/proc</code> currently provides.</div>
|
||
</div>
|
||
</div>
|
||
<div class="motivation-card">
|
||
<div class="motivation-card-marker mc-armed"></div>
|
||
<div class="motivation-card-body">
|
||
<div class="motivation-card-title">KNN val ↔ test gap</div>
|
||
<div class="motivation-card-text">KNN scores val
|
||
macro-F1 ≈ 0.74 on samples it saw, but only ≈ 0.13
|
||
on held-out sample_names. Instance-based memorization
|
||
of the specific training samples — informative as a
|
||
baseline, not a deployment candidate.</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 19. conclusion-future — summary + unsupervised next steps -->
|
||
<div class="stage-view" data-view="conclusion-future">
|
||
<div class="metric-stack metric-stack-wide">
|
||
<div class="metric-eyebrow">conclusion + future work</div>
|
||
<div class="conclusion-grid">
|
||
<div class="conclusion-col">
|
||
<div class="conclusion-col-title">what we showed</div>
|
||
<ul class="conclusion-list">
|
||
<li>A per-host detector trained on
|
||
<strong>/proc-only telemetry</strong> can classify
|
||
workload phases at multi-class macro-F1 well above
|
||
chance.</li>
|
||
<li>Held-out-by-<strong>sample</strong>,
|
||
profile-stratified, is the right generalization
|
||
axis: both fleet hosts contribute to all three
|
||
slices, and the test set's
|
||
<code>sample_name</code>s never appear during
|
||
training.</li>
|
||
<li>The recurrent family (LSTM/GRU) and Transformer
|
||
sit on the upper-left of the
|
||
<strong>accuracy-vs-cost frontier</strong>; KNN and
|
||
GBT round out the comparison as honest baselines.</li>
|
||
<li>The detector slots into a wider <strong>trust /
|
||
containment / recovery</strong> loop — the per-host
|
||
verdict isn't the final answer, it's one input.</li>
|
||
</ul>
|
||
</div>
|
||
<div class="conclusion-col">
|
||
<div class="conclusion-col-title">next steps · unsupervised</div>
|
||
<ul class="conclusion-list">
|
||
<li><strong>Clustering</strong> the unlabeled tail of
|
||
new fleet data (KMeans / HDBSCAN) to surface novel
|
||
workload shapes the supervised model has no class
|
||
for — a self-training feedback loop.</li>
|
||
<li><strong>Anomaly detection</strong> on the
|
||
last-layer embedding (one-class SVM, isolation forest)
|
||
so a "none of the five known phases" verdict is
|
||
available alongside the classifier output.</li>
|
||
<li><strong>Self-supervised pretraining</strong> on
|
||
the much larger pool of unlabeled telemetry from
|
||
operational hosts; supervised fine-tune on the
|
||
smaller orchestrated dataset.</li>
|
||
<li><strong>Embedding visualisation</strong> via
|
||
UMAP / t-SNE for human-in-the-loop labelling of
|
||
the unlabeled tail (already prototyped in scene 12).</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
</div>
|
||
<button id="next-fab" class="fab" data-no-advance title="Next (→)">▼</button>
|
||
</div>
|
||
|
||
<article class="article">
|
||
|
||
<section class="scene" data-stage="intro">
|
||
<div class="prose">
|
||
<p class="lede">Most malware doesn't look like malware in a database
|
||
— it looks like a process behaving badly.</p>
|
||
<p>An <strong>intrusion detection system</strong> spots the bad
|
||
behavior; an <strong>intrusion prevention system</strong> stops it.
|
||
Both depend on knowing what bad behavior <em>looks like</em> at the
|
||
level of telemetry the device can actually see.</p>
|
||
<p>This deck is the live face of the dataset we're building to teach
|
||
a model that distinction — every panel on the left is a slice of
|
||
real data shipping in right now.</p>
|
||
<p class="hint">scroll, click, or → to advance</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="collect">
|
||
<div class="prose">
|
||
<h2>Collecting the dataset</h2>
|
||
<p>Each lab host on the WireGuard mesh boots a real Alpine VM, runs
|
||
a profile-driven workload inside it, and samples
|
||
<code>/proc/<qemu_pid></code> at 10 Hz. Every ~30 seconds
|
||
the labeled tarball is shipped to this Pi over mTLS.</p>
|
||
<p>The counter on the left is the running total, sourced from the
|
||
receiver's <code>index.jsonl</code> on disk. The sparkline is the
|
||
arrival rate over the last sixty seconds — proof that the deck
|
||
is reading live data, not a fixed slide.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="motivation">
|
||
<div class="prose">
|
||
<h2>Why detect at all?</h2>
|
||
<p>Knowing a device is compromised is the precondition for everything
|
||
else. A classifier that says "this host is infected right now"
|
||
turns into three concrete operational capabilities — and each
|
||
one rewards a faster, more confident detector.</p>
|
||
<p><strong>Trust scoring across the network.</strong> Recent work
|
||
on per-device trust establishment
|
||
(<a href="https://ieeexplore.ieee.org/document/9881803"
|
||
target="_blank" rel="noopener">IEEE 9881803</a>) argues that
|
||
on-device metrics alone aren't enough — a fleet has to combine
|
||
local classifier verdicts with network-behaviour signals
|
||
(peer observations, gateway traffic patterns, inter-host
|
||
relationships) to score trust reliably. Our per-host detector
|
||
is one input to that broader signal.</p>
|
||
<p><strong>Containment.</strong> Once a host is flagged, the
|
||
gateway can drop its traffic and the IAM layer can revoke
|
||
credentials before lateral movement begins. Detection
|
||
latency translates directly into how much of the network
|
||
an attacker reaches.</p>
|
||
<p><strong>Quick recovery.</strong> A confirmed infection time
|
||
lets you restore from a snapshot taken just before the
|
||
compromise — no forensic dwell time, no guessing how far
|
||
back to roll. The recovery path becomes a one-button operation
|
||
instead of a week of cleanup.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="problem-statement">
|
||
<div class="prose">
|
||
<h2>Problem statement</h2>
|
||
<p>Today's behaviour-based IDS systems rely on syscall traces,
|
||
kernel hooks, or rich endpoint agents that can't ship to
|
||
constrained or untrusted hosts. We want a detector that
|
||
runs on the only telemetry every modern Linux already
|
||
exports — <code>/proc</code> — and labels each ten-second
|
||
window of activity with the phase the workload is in.</p>
|
||
<p><strong>Research question.</strong> Can a sequence model
|
||
trained on twelve channels of <code>/proc</code> telemetry
|
||
classify five workload phases (clean / armed / infecting /
|
||
infected_running / dormant) accurately enough to drive
|
||
automated containment, <em>and</em> generalize to malware
|
||
<code>sample_name</code>s it has never seen during training?</p>
|
||
<p>The task is <strong>multi-class classification</strong>:
|
||
the target is one of five mutually-exclusive phase labels.
|
||
Not regression (no continuous target), not ranking
|
||
(downstream policy is a categorical containment decision).
|
||
We deliberately chose 10-second windows so detection
|
||
latency stays bounded for a real fleet.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="research-questions">
|
||
<div class="prose">
|
||
<h2>Research gaps + questions</h2>
|
||
<p>Literature on behaviour-based malware detection is rich but
|
||
uneven. Most published results either (a) use richer
|
||
telemetry than what a constrained host actually exports, or
|
||
(b) frame evaluation in ways that hide same-sample overfit
|
||
(training and testing on the same malware instances). The card on the left summarises the
|
||
gap.</p>
|
||
<p>This project asks three concrete questions:</p>
|
||
<p><strong>RQ1.</strong> How well can a per-window classifier
|
||
identify workload phases from <code>/proc</code> alone, with
|
||
no syscall traces and no kernel hooks?</p>
|
||
<p><strong>RQ2.</strong> Does the model still work on
|
||
<code>sample_name</code>s the training set never saw —
|
||
i.e., new instances of malware profiles it does know?</p>
|
||
<p><strong>RQ3.</strong> Of the standard sequence-model
|
||
families (RNN, GRU, LSTM, CNN, Transformer) plus a
|
||
non-parametric baseline (KNN) and a tabular baseline
|
||
(gradient-boosted trees), which trade off accuracy and
|
||
inference cost best for a deployment that has to run on a
|
||
constrained host?</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="solution-overview">
|
||
<div class="prose">
|
||
<h2>Proposed solution</h2>
|
||
<p>A single end-to-end pipeline turns raw <code>/proc</code>
|
||
telemetry on a fleet host into a per-window phase verdict
|
||
in under a second. Each stage of the diagram on the left
|
||
is a thin, independently-deployable component — the
|
||
receiver doesn't know what model is running; the model
|
||
doesn't know where the episode came from.</p>
|
||
<p>The <strong>model zoo</strong> is the key abstraction:
|
||
every model class registers itself by name, declares its
|
||
input kind (summary features or window tensors), and plugs
|
||
into one shared training loop. KNN, GBT, MLP, CNN, RNN,
|
||
GRU, LSTM, and Transformer all reuse the same standardization,
|
||
schema-hashed checkpoint format, class-weighted CE loss,
|
||
and held-out-by-sample evaluation — so the comparison is
|
||
genuinely apples-to-apples.</p>
|
||
<p>The detector's per-window verdict feeds two downstream
|
||
loops: a fleet-wide <strong>trust score</strong> that
|
||
combines local classification with network-behaviour
|
||
signals (per IEEE 9881803), and a <strong>fast-recovery</strong>
|
||
snapshot rollback when an infection time is known.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="stack">
|
||
<div class="prose">
|
||
<h2>Live, not staged</h2>
|
||
<p>Every panel from here on is real data from real devices —
|
||
counters, bars, the episode database, all driven by the
|
||
<code>cis490-receiver</code> service running on this Pi as
|
||
you scroll.</p>
|
||
<p>The code on the left is how it gets here. Four runtime deps:
|
||
<strong>starlette</strong> + <strong>uvicorn</strong> for the
|
||
async HTTP and WebSocket surface, <strong>msgpack</strong>
|
||
talks to Metasploit's RPC, <strong>pycdlib</strong> builds the
|
||
lab-VM cidata ISOs. Everything else is the standard library,
|
||
and every dep is annotated with a one-line reason it's there.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="hosts">
|
||
<div class="prose">
|
||
<h2>A multi-host fleet</h2>
|
||
<p>Running the same orchestrator on multiple hosts gives novel,
|
||
non-overlapping data per host — no central coordinator. Each host
|
||
pulls a different slice of the manifest, so the dataset grows in
|
||
parallel.</p>
|
||
<p>The numbers below are absolute episode counts on disk, refreshed
|
||
from <code>/var/lib/cis490/episodes/<host>/</code> every
|
||
thirty seconds.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="db">
|
||
<div class="prose">
|
||
<h2>The dataset, browsable</h2>
|
||
<p>Every row is one labeled episode tarball stored at
|
||
<code>/var/lib/cis490/episodes/<host>/<id>.tar.zst</code>
|
||
after the receiver verifies its SHA-256 and writes it through.</p>
|
||
<p>Filter by host with the tabs, or grep by host / episode id /
|
||
sha with the search box. Click a row for the full
|
||
<code>index.jsonl</code> record. The view holds the most recent
|
||
two hundred records — older history is on disk, indexable
|
||
from the receiver.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="baseline">
|
||
<div class="prose">
|
||
<h2>A baseline of normal</h2>
|
||
<p>Before we can detect a deviation, we have to know what the fleet
|
||
looks like across a wide slice of its life. The stacked bar
|
||
aggregates ground-truth phase labels across hundreds of randomly
|
||
sampled episodes from the dataset on disk — weighted by the time
|
||
the workload actually spent in each phase, not just the count of
|
||
transitions.</p>
|
||
<p>If the model only ever sees <code>clean</code>, it overfits to
|
||
"everything is fine." The phase schedule fixes that by forcing
|
||
every run to walk through every phase, which is why
|
||
<code>infected_running</code> dominates the mix — that's where
|
||
the labelled attack workload sits.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="attacks">
|
||
<div class="prose">
|
||
<h2>Linking attack to telemetry</h2>
|
||
<p>The same six profiles run across every host, and each one
|
||
produces a different envelope in <code>/proc</code>. A
|
||
cryptominer pegs one core for minutes. A bursty C2 channel sits
|
||
idle, then exhales three packets. Ransomware walks the
|
||
filesystem and saturates I/O.</p>
|
||
<p>The thumbnails on the left are the canonical envelopes the
|
||
model has to learn to recognize — same axes, different shapes.
|
||
That shape difference is what makes detection tractable.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="chunking">
|
||
<div class="prose">
|
||
<h2>Ten-second windows</h2>
|
||
<p>Models eat fixed-size inputs. We chop each episode into
|
||
10-second windows — 100 samples per window at 10 Hz — and
|
||
label each window with the phase that occupies its center.</p>
|
||
<p>Window size is a knob. Too short and the model can't see slow
|
||
envelopes (low-and-slow malware, idle C2). Too long and you can't
|
||
react fast enough to be a useful prevention signal. Ten seconds
|
||
is the starting point we tune around.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="evaluation-setup">
|
||
<div class="prose">
|
||
<h2>Evaluation setup</h2>
|
||
<p>Three choices anchor every result on the next slides — the
|
||
split recipe, the primary metric, and what we measure next
|
||
to accuracy. The temptation is to report a single big
|
||
number; we report a number you can argue with.</p>
|
||
<p><strong>Held-out by <code>sample_name</code>,
|
||
profile-stratified.</strong> The fleet is uniform — every
|
||
host runs the same orchestrator and the same set of
|
||
profiles — so we don't split by device. Both hosts
|
||
contribute data to train, val, and test. What's held out is
|
||
specific malware <em>instances</em>: the
|
||
<code>sample_name</code>s in the test set never appear
|
||
during training. The model has to generalize to unseen
|
||
samples, not unseen devices.</p>
|
||
<p><strong>Macro-F1, not accuracy.</strong> The dataset is
|
||
heavily skewed: roughly half the labelled time is
|
||
<code>infected_running</code> and only ~5 % is
|
||
<code>armed</code>. A "predict the majority class"
|
||
baseline already hits 0.5 accuracy. Macro-F1 averages F1
|
||
across all five phases so rare classes count.</p>
|
||
<p><strong>Latency reported with accuracy.</strong> A model
|
||
that's one F1 point better but ten milliseconds slower
|
||
may still be the wrong choice for an on-host detector.
|
||
The perf scene plots both axes so the trade-off is visible.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="training-code">
|
||
<div class="prose">
|
||
<h2>How we trained them</h2>
|
||
<p>One trainer per model — load the windowed dataset, define the
|
||
network, train, evaluate. Same shape for RNN, GRU, LSTM, BERT,
|
||
so you can read all four side-by-side and the only differences
|
||
are the architecture itself.</p>
|
||
<p>The code on the left is the LSTM trainer.
|
||
PyTorch's <code>DataLoader</code> handles windowing,
|
||
<code>nn.LSTM</code> is one line, the loop is six.
|
||
No custom loss, no rate schedule, no manual batching —
|
||
anything fancier has to earn its place by beating the simple
|
||
version on held-out samples.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="models">
|
||
<div class="prose">
|
||
<h2>Sequence models</h2>
|
||
<p><strong>RNN, GRU, LSTM</strong> — recurrent models that read the
|
||
window one timestep at a time and carry state forward. Cheap,
|
||
mature, easy to interpret.</p>
|
||
<p><strong>BERT-style transformer</strong> — the window becomes a
|
||
sequence of "tokens"; attention captures cross-position context
|
||
instead of accumulating it through a hidden state. More
|
||
parameters, more compute, more room to overfit a small dataset.</p>
|
||
<p>Same input, same labels, four different inductive biases. The
|
||
comparison on the left is the punchline of the whole project.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="knn">
|
||
<div class="prose">
|
||
<h2>Nearest-neighbor as a sanity check</h2>
|
||
<p>Before anything fancy: engineer summary features per window
|
||
(mean, std, p95, slope, zero-bucket counts per channel) and run
|
||
<strong>KNN</strong> in that feature space.</p>
|
||
<p>If the phase clusters separate visibly in two dimensions, KNN
|
||
already does most of the work and a deep model is only buying
|
||
marginal improvement. If they don't separate, you've learned
|
||
something about the feature engineering before training a single
|
||
epoch.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="perf">
|
||
<div class="prose">
|
||
<h2>Accuracy vs complexity</h2>
|
||
<p>Bigger models earn better numbers in the validation set — but
|
||
they also need more parameters, more inference time, and more
|
||
memory at the edge. The deployed model has to fit on the device
|
||
it's protecting.</p>
|
||
<p>The scatter on the left is the usable trade-off curve: every
|
||
point above and to the left of where you currently sit is a
|
||
reachable upgrade. The point in the bottom-right is a model
|
||
you'd never ship.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="live">
|
||
<div class="prose">
|
||
<h2>Catching attacks live</h2>
|
||
<p>The <strong>A100</strong> runs inference against incoming
|
||
ten-second windows from the fleet. Each row on the stage is
|
||
<em>one trained model</em> doing live prediction; each cell
|
||
is its phase verdict on a freshly-arrived window, painted
|
||
by the predicted phase.</p>
|
||
<p>Read the lanes side-by-side as a model-agreement check:
|
||
when the recurrent family (RNN / GRU / LSTM) all flip to
|
||
<code>infecting</code> at the same time, that's strong
|
||
evidence the host actually is. When ground truth from
|
||
<code>labels.jsonl</code> catches up, mismatched cells get
|
||
a hatched overlay and the running hit-rate ticks. The
|
||
callout below holds the most recent prediction with model
|
||
name, A100 round-trip latency, and confidence.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="theoretical">
|
||
<div class="prose">
|
||
<h2>Theoretical contributions</h2>
|
||
<p>Three methodological claims this project makes — small in
|
||
isolation, but together they change how the comparison is
|
||
run. Each shows up explicitly in the codebase.</p>
|
||
<p><strong>Window-centre labelling.</strong> Instead of
|
||
majority-voting phase labels across each 10-second window
|
||
(which creates noisy boundaries), we label each window by
|
||
the phase that occupies its centre. Cleaner training
|
||
signal at transitions, no spurious "ambiguous" class.</p>
|
||
<p><strong>Schema-hashed checkpoints.</strong> Every
|
||
checkpoint embeds a hash of the feature schema it was
|
||
trained on. Loading a model against a different schema
|
||
fails fast. Without this, retroactive comparison silently
|
||
scores models on misaligned columns and reports nonsense.</p>
|
||
<p><strong>Held-out-by-sample, profile-stratified.</strong>
|
||
Hosts in the fleet are uniform — same orchestrator, same
|
||
workload, just different production rates — so we split by
|
||
malware <code>sample_name</code> instead of by device. The
|
||
generalization claim is "unseen malware sample", tested on
|
||
the same population of hosts that contributed the training
|
||
data.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="practical">
|
||
<div class="prose">
|
||
<h2>Practical contributions</h2>
|
||
<p>What others can pick up and use from this project — beyond
|
||
the published numbers.</p>
|
||
<p><strong>/proc-only deployment.</strong> The detector needs
|
||
no syscall hooks, no eBPF, no kernel module. It runs on
|
||
hosts that don't permit deeper instrumentation — a small
|
||
VM, a container with limited capabilities, an embedded
|
||
device. One Python service plus a model file.</p>
|
||
<p><strong>Producer-agnostic dashboard.</strong> The deck
|
||
consumes typed events
|
||
(<code>training/dashboard/events.py</code>); the inference
|
||
loop runs anywhere — Pi, A100, cloud — and just POSTs back.
|
||
Same UI for a lab demo and an operational console.</p>
|
||
<p><strong>Labelled dataset on disk.</strong> 78 000+
|
||
episodes across two hosts and six attack profiles, archived
|
||
in zstd-compressed tarballs with a schema-versioned format.
|
||
Anyone reproducing or extending this work can start from
|
||
the dataset directly without re-running the orchestrator.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="design-principles">
|
||
<div class="prose">
|
||
<h2>Design principles</h2>
|
||
<p>Three patterns that emerged during the project and earned
|
||
their keep enough that we'd repeat them.</p>
|
||
<p><strong>One loop, many models.</strong> Every NN
|
||
architecture plugs into the same training loop — class
|
||
weights, AMP autocast, cosine LR with warmup, gradient
|
||
clipping, early stop on val macro-F1. Architecture changes
|
||
don't ripple into orchestration, and adding a new model
|
||
class costs ~80 lines.</p>
|
||
<p><strong>Typed events as contract.</strong> Producers and
|
||
consumers agree on dataclasses, not free-form dicts.
|
||
Adding a new dashboard scene means adding a new dataclass;
|
||
adding a new producer means importing it. Static checking
|
||
and editor autocomplete do most of the work that a
|
||
schema-validation library would do at runtime.</p>
|
||
<p><strong>Two-agent path ownership.</strong> Dashboard work
|
||
and model work live in two parallel sessions with a
|
||
documented path-ownership boundary
|
||
(<code>training/dashboard/</code> vs everywhere else).
|
||
Merges go through git with explicit rebases instead of a
|
||
shared workspace — slow up front, fewer subtle stomps
|
||
over time.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="limitations">
|
||
<div class="prose">
|
||
<h2>Limitations</h2>
|
||
<p>What this project cannot honestly claim — and why each
|
||
line on the left matters for how the results should be read.</p>
|
||
<p><strong>Two-host fleet.</strong> Cross-host generalization
|
||
is reported between exactly two machines; it's the right
|
||
<em>shape</em> of evaluation but not a population claim.
|
||
More hosts on the WireGuard mesh would let us report
|
||
distributional bounds rather than single point comparisons.</p>
|
||
<p><strong>Synthetic attack profiles.</strong> Our six
|
||
profiles cover the main behavioural envelopes
|
||
(cpu-saturate, ransomware-lite, bursty-c2, fork-bomb,
|
||
crypto-miner, distccd-exec) but real-world malware can
|
||
sit between or outside these envelopes. Generalization to
|
||
unseen profiles is reported via held-out-by-sample, but
|
||
in-the-wild distribution shift is unknown.</p>
|
||
<p><strong>10 Hz sampling floor.</strong> Sub-100ms
|
||
behaviours fall inside a single sample. Detection of
|
||
millisecond-scale privilege checks would need faster
|
||
telemetry than <code>/proc</code> provides.</p>
|
||
<p><strong>KNN val ↔ test gap.</strong> KNN scores val
|
||
macro-F1 ≈ 0.74 on samples it saw, but only ≈ 0.13 on
|
||
held-out <code>sample_name</code>s. Instance-based
|
||
memorization of the specific training samples — informative
|
||
as a baseline, not a deployment candidate.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="conclusion-future">
|
||
<div class="prose">
|
||
<h2>Conclusion + future work</h2>
|
||
<p>A per-host classifier trained on <code>/proc</code>-only
|
||
telemetry can identify workload phases at multi-class
|
||
macro-F1 well above chance and slot into a wider
|
||
trust / containment / recovery loop. The recurrent family
|
||
(LSTM/GRU) and Transformer sit on the upper-left of the
|
||
accuracy-vs-cost frontier; KNN and GBT are honest baselines.
|
||
Held-out-by-host evaluation is the right generalization
|
||
axis — held-out-by-sample overstates real fleet
|
||
performance by 0.3+ F1.</p>
|
||
<p><strong>Unsupervised next steps.</strong> The natural
|
||
extensions are unsupervised:</p>
|
||
<p>• <strong>Clustering</strong> the unlabeled tail of new
|
||
fleet data (KMeans / HDBSCAN) to surface novel workload
|
||
shapes the supervised model has no class for — a
|
||
self-training feedback loop that enrolls new phases as
|
||
the fleet grows.</p>
|
||
<p>• <strong>Anomaly detection</strong> on the last-layer
|
||
embedding (one-class SVM, isolation forest) so a "none of
|
||
the five known phases" verdict is available alongside the
|
||
classifier output.</p>
|
||
<p>• <strong>Self-supervised pretraining</strong> on the much
|
||
larger pool of unlabeled telemetry from operational hosts;
|
||
supervised fine-tune on the smaller orchestrated dataset.</p>
|
||
<p>• <strong>Embedding visualisation</strong> via UMAP /
|
||
t-SNE for human-in-the-loop labelling — already prototyped
|
||
in the KNN scene's interactive 3-D scatter.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<section class="scene" data-stage="references">
|
||
<div class="prose">
|
||
<h2>References</h2>
|
||
<p>The papers, notes, and prior work this project leans on.
|
||
Pick a tab on the left to load the document; the viewer
|
||
takes the bulk of the stage so you can scroll through
|
||
without leaving the deck.</p>
|
||
<p class="hint">end of deck · ← to flip back</p>
|
||
</div>
|
||
</section>
|
||
|
||
<div class="scene-end-spacer"></div>
|
||
</article>
|
||
</div>
|
||
|
||
<script src="/static/dashboard.js?v=5316d1d8"></script>
|
||
</body>
|
||
</html>
|