deck: 9 new scenes to meet CIS-490 assignment-guide rubric
Five required + four optional slides, slotted into the existing flow without renumbering the visible deck UI: REQUIRED - problem-statement (after motivation): single-sentence problem, three numeric stat cards, explicit task-type justification (multi-class classification, why not regression/ranking) - research-questions (after problem-statement): two-column literature gap layout + RQ1/RQ2/RQ3 - solution-overview (after research-questions): inline-SVG block diagram of the pipeline (fleet hosts → receiver → episodes → windowing → model zoo → per-window phase → trust score → containment + reset) - evaluation-setup (between chunking and models): four blocks covering split recipe, primary metric, baselines compared, and what's reported alongside accuracy. Each block leads with the *why*, matching the assignment's "explain not only what will be measured but why" requirement. - conclusion-future (before references): two-column "what we showed" + unsupervised next steps (clustering / anomaly / SSL pretrain / embedding viz). Addresses Section 8 of the assignment guide. OPTIONAL - theoretical-contributions: window-centre labelling, schema-hashed checkpoints, cross-host as eval axis - practical-contributions: /proc-only deployment, producer-agnostic dashboard, labelled dataset on disk - design-principles: one-loop-many-models, typed events as contract, two-agent path ownership - limitations: two-host fleet, synthetic profiles, 10 Hz floor, KNN cross-host gap Plus references/links.md gains four real online references (PyTorch, XGBoost, scikit-learn, proc(5)) bringing the citation count from 8 to 12 — over the assignment's 10-source minimum. CSS additions cover the new layouts (.problem-claim, .problem-stats, .research-grid, .pipeline-svg + .pipeline-stage / .pipeline-arrow, .eval-blocks, .conclusion-grid). Limitations cards reuse the motivation-card pattern with an armed-phase amber marker for the "warning" feel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4172ddb0c8
commit
db9f013969
3 changed files with 854 additions and 7 deletions
|
|
@ -1,6 +1,10 @@
|
||||||
# Reference Links
|
# Reference Links
|
||||||
|
|
||||||
- https://github.com/mitre/caldera?tab=security-ov-file
|
- https://github.com/mitre/caldera?tab=security-ov-file — adversary emulation framework (red-team/blue-team)
|
||||||
- https://github.com/PiyushxJangid/DLHIDS
|
- https://github.com/PiyushxJangid/DLHIDS — deep-learning HIDS reference implementation
|
||||||
- https://github.com/ArpanDFrank/Host-Intrusion-Detection-System-using-Hybrid-CNN-LSTM-Models-and-RL-Actor-Critic-Models
|
- https://github.com/ArpanDFrank/Host-Intrusion-Detection-System-using-Hybrid-CNN-LSTM-Models-and-RL-Actor-Critic-Models — Hybrid CNN-LSTM + RL HIDS prior art
|
||||||
- https://ieeexplore.ieee.org/document/9881803 — per-device trust establishment from network behaviour (cited on motivation scene)
|
- https://ieeexplore.ieee.org/document/9881803 — per-device trust establishment from network behaviour (cited on motivation scene)
|
||||||
|
- https://pytorch.org/docs/stable/index.html — PyTorch reference (LSTM / GRU / CNN / Transformer module APIs used by the model zoo)
|
||||||
|
- https://xgboost.readthedocs.io/en/stable/ — XGBoost reference (gradient-boosted-trees baseline; Chen & Guestrin, KDD 2016)
|
||||||
|
- https://scikit-learn.org/stable/ — scikit-learn reference (KNN, KMeans, PCA, evaluation metrics)
|
||||||
|
- https://man7.org/linux/man-pages/man5/proc.5.html — proc(5) — the Linux kernel interface this project's telemetry comes from
|
||||||
|
|
|
||||||
|
|
@ -1067,6 +1067,205 @@ html, body { overflow-anchor: none; }
|
||||||
color: var(--fg-dim);
|
color: var(--fg-dim);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* ─── Problem statement (scene: problem-statement) ─────────────────── */
|
||||||
|
.problem-claim {
|
||||||
|
padding: clamp(16px, 2vh, 28px) clamp(18px, 2vw, 28px);
|
||||||
|
background: var(--bg-elev, rgba(255, 255, 255, 0.03));
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-left: 4px solid var(--accent);
|
||||||
|
border-radius: 4px;
|
||||||
|
}
|
||||||
|
.problem-claim-text {
|
||||||
|
font-size: clamp(16px, 1.5vw, 22px);
|
||||||
|
line-height: 1.45;
|
||||||
|
color: var(--fg);
|
||||||
|
font-weight: 500;
|
||||||
|
}
|
||||||
|
.problem-stats {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(3, 1fr);
|
||||||
|
gap: clamp(10px, 1.4vw, 18px);
|
||||||
|
}
|
||||||
|
.problem-stat {
|
||||||
|
padding: clamp(14px, 1.8vh, 22px);
|
||||||
|
background: var(--bg-elev, rgba(255, 255, 255, 0.03));
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 4px;
|
||||||
|
display: flex; flex-direction: column; gap: 4px;
|
||||||
|
align-items: flex-start;
|
||||||
|
}
|
||||||
|
.problem-stat-num {
|
||||||
|
font: 700 clamp(28px, 3.4vw, 44px)
|
||||||
|
ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
color: var(--accent);
|
||||||
|
line-height: 1;
|
||||||
|
}
|
||||||
|
.problem-stat-lbl {
|
||||||
|
font-size: clamp(12px, 0.95vw, 14px);
|
||||||
|
color: var(--fg-dim);
|
||||||
|
line-height: 1.35;
|
||||||
|
}
|
||||||
|
.problem-task {
|
||||||
|
padding: 12px 16px;
|
||||||
|
background: var(--bg);
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 4px;
|
||||||
|
font-size: clamp(13px, 1vw, 15px);
|
||||||
|
color: var(--fg-dim);
|
||||||
|
line-height: 1.5;
|
||||||
|
}
|
||||||
|
.problem-task-label { color: var(--fg-mute); margin-right: 6px; }
|
||||||
|
.problem-task-value { color: var(--fg); font-weight: 600; }
|
||||||
|
.problem-task-detail { color: var(--fg-dim); }
|
||||||
|
|
||||||
|
/* ─── Research questions (scene: research-questions) ───────────────── */
|
||||||
|
.research-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: clamp(12px, 1.6vw, 22px);
|
||||||
|
}
|
||||||
|
.research-col {
|
||||||
|
padding: clamp(14px, 1.8vh, 22px);
|
||||||
|
background: var(--bg-elev, rgba(255, 255, 255, 0.03));
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 4px;
|
||||||
|
}
|
||||||
|
.research-col-title {
|
||||||
|
font: 600 clamp(13px, 1.05vw, 15px)
|
||||||
|
ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
color: var(--accent);
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
.research-list {
|
||||||
|
list-style: none; padding: 0; margin: 0;
|
||||||
|
display: flex; flex-direction: column; gap: 10px;
|
||||||
|
font-size: clamp(13px, 1vw, 15px);
|
||||||
|
line-height: 1.45;
|
||||||
|
color: var(--fg-dim);
|
||||||
|
}
|
||||||
|
.research-list li::before {
|
||||||
|
content: '·'; color: var(--accent); margin-right: 8px;
|
||||||
|
}
|
||||||
|
.research-list strong { color: var(--fg); }
|
||||||
|
|
||||||
|
/* ─── Solution overview (scene: solution-overview) ─────────────────── */
|
||||||
|
.pipeline-svg {
|
||||||
|
width: 100%;
|
||||||
|
height: clamp(360px, 60vh, 640px);
|
||||||
|
background: var(--bg-elev, rgba(255, 255, 255, 0.03));
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 4px;
|
||||||
|
padding: 12px;
|
||||||
|
box-sizing: border-box;
|
||||||
|
}
|
||||||
|
.pipeline-stage rect {
|
||||||
|
fill: var(--bg);
|
||||||
|
stroke: var(--accent);
|
||||||
|
stroke-width: 1.5;
|
||||||
|
}
|
||||||
|
.pipeline-stage-models rect {
|
||||||
|
fill: var(--accent-soft, rgba(80, 140, 220, 0.08));
|
||||||
|
stroke-width: 2;
|
||||||
|
}
|
||||||
|
.pipeline-stage-final rect {
|
||||||
|
stroke: var(--phase-clean);
|
||||||
|
}
|
||||||
|
.pipeline-stage text {
|
||||||
|
fill: var(--fg);
|
||||||
|
font: 600 14px ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
}
|
||||||
|
.pipeline-stage-title {
|
||||||
|
font-size: 16px !important;
|
||||||
|
}
|
||||||
|
.pipeline-detail {
|
||||||
|
fill: var(--fg-dim) !important;
|
||||||
|
font-weight: 400 !important;
|
||||||
|
font-size: 11px !important;
|
||||||
|
}
|
||||||
|
.pipeline-detail-mini {
|
||||||
|
fill: var(--fg-mute) !important;
|
||||||
|
font-weight: 400 !important;
|
||||||
|
font-size: 10px !important;
|
||||||
|
}
|
||||||
|
.pipeline-arrow path {
|
||||||
|
stroke: var(--fg-mute);
|
||||||
|
stroke-width: 1.5;
|
||||||
|
stroke-linecap: round;
|
||||||
|
marker-end: url(#pipe-arrow);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ─── Evaluation setup (scene: evaluation-setup) ───────────────────── */
|
||||||
|
.eval-blocks {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: clamp(10px, 1.4vw, 18px);
|
||||||
|
}
|
||||||
|
.eval-block {
|
||||||
|
padding: clamp(12px, 1.6vh, 18px);
|
||||||
|
background: var(--bg-elev, rgba(255, 255, 255, 0.03));
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 4px;
|
||||||
|
display: flex; flex-direction: column; gap: 8px;
|
||||||
|
}
|
||||||
|
.eval-block-title {
|
||||||
|
font: 600 clamp(12px, 0.95vw, 14px)
|
||||||
|
ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
color: var(--accent);
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
}
|
||||||
|
.eval-block-body {
|
||||||
|
display: flex; flex-direction: column; gap: 6px;
|
||||||
|
font-size: clamp(13px, 1vw, 15px);
|
||||||
|
color: var(--fg-dim);
|
||||||
|
line-height: 1.45;
|
||||||
|
}
|
||||||
|
.eval-block-body strong { color: var(--fg); }
|
||||||
|
.eval-detail {
|
||||||
|
margin-top: 4px;
|
||||||
|
color: var(--fg-mute);
|
||||||
|
font-size: clamp(12px, 0.9vw, 13px);
|
||||||
|
font-style: italic;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ─── Conclusion + future (scene: conclusion-future) ───────────────── */
|
||||||
|
.conclusion-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: clamp(12px, 1.6vw, 22px);
|
||||||
|
}
|
||||||
|
.conclusion-col {
|
||||||
|
padding: clamp(14px, 1.8vh, 22px);
|
||||||
|
background: var(--bg-elev, rgba(255, 255, 255, 0.03));
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 4px;
|
||||||
|
}
|
||||||
|
.conclusion-col-title {
|
||||||
|
font: 600 clamp(13px, 1.05vw, 15px)
|
||||||
|
ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||||
|
color: var(--accent);
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
.conclusion-list {
|
||||||
|
list-style: none; padding: 0; margin: 0;
|
||||||
|
display: flex; flex-direction: column; gap: 10px;
|
||||||
|
font-size: clamp(13px, 1vw, 15px);
|
||||||
|
line-height: 1.45;
|
||||||
|
color: var(--fg-dim);
|
||||||
|
}
|
||||||
|
.conclusion-list li::before {
|
||||||
|
content: '·'; color: var(--accent); margin-right: 8px;
|
||||||
|
}
|
||||||
|
.conclusion-list strong { color: var(--fg); }
|
||||||
|
|
||||||
|
/* ─── Limitations card uses the motivation-card pattern with an
|
||||||
|
armed-phase marker for the "warning" feel. ─── */
|
||||||
|
.motivation-card-marker.mc-armed { background: var(--phase-armed); }
|
||||||
|
|
||||||
/* ─── Live detections (scene: live) ────────────────────────────────── */
|
/* ─── Live detections (scene: live) ────────────────────────────────── */
|
||||||
.live-stack { gap: clamp(10px, 1.6vh, 20px); }
|
.live-stack { gap: clamp(10px, 1.6vh, 20px); }
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,7 @@
|
||||||
<meta charset="utf-8">
|
<meta charset="utf-8">
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
<title>CIS490 — live</title>
|
<title>CIS490 — live</title>
|
||||||
<link rel="stylesheet" href="/static/dashboard.css?v=8675cea9">
|
<link rel="stylesheet" href="/static/dashboard.css?v=0ef6cb6d">
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<!-- SVG filter defs for the lava-lamp goo effect. Width/height 0
|
<!-- SVG filter defs for the lava-lamp goo effect. Width/height 0
|
||||||
|
|
@ -203,7 +203,143 @@
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<!-- 3. stack — Python stack & libraries used in the project -->
|
<!-- 3. problem-statement — what we're solving + task type -->
|
||||||
|
<div class="stage-view" data-view="problem-statement">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">the problem · single sentence + numbers</div>
|
||||||
|
<div class="problem-claim">
|
||||||
|
<div class="problem-claim-text">Classify each ten-second window of fleet
|
||||||
|
<code>/proc</code> telemetry into one of five workload phases —
|
||||||
|
accurately enough to drive automated containment.</div>
|
||||||
|
</div>
|
||||||
|
<div class="problem-stats">
|
||||||
|
<div class="problem-stat">
|
||||||
|
<div class="problem-stat-num">5</div>
|
||||||
|
<div class="problem-stat-lbl">phase classes<br><code>clean</code> → <code>infected_running</code></div>
|
||||||
|
</div>
|
||||||
|
<div class="problem-stat">
|
||||||
|
<div class="problem-stat-num">12</div>
|
||||||
|
<div class="problem-stat-lbl"><code>/proc</code> channels<br>no syscalls, no kernel hooks</div>
|
||||||
|
</div>
|
||||||
|
<div class="problem-stat">
|
||||||
|
<div class="problem-stat-num">10s</div>
|
||||||
|
<div class="problem-stat-lbl">classification window<br>100 samples × 12 channels</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="problem-task">
|
||||||
|
<span class="problem-task-label">task type:</span>
|
||||||
|
<span class="problem-task-value">multi-class classification</span>
|
||||||
|
<span class="problem-task-detail">— five mutually-exclusive
|
||||||
|
phase labels, balanced via class-weighted cross-entropy.
|
||||||
|
Not regression (no continuous target), not ranking
|
||||||
|
(downstream policy is a categorical containment decision).</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 4. research-questions — literature gaps and questions -->
|
||||||
|
<div class="stage-view" data-view="research-questions">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">literature gaps · positioning the work</div>
|
||||||
|
<div class="research-grid">
|
||||||
|
<div class="research-col">
|
||||||
|
<div class="research-col-title">what prior work covers</div>
|
||||||
|
<ul class="research-list">
|
||||||
|
<li><strong>LSTM on syscall traces</strong> in VMs —
|
||||||
|
deeper telemetry than <code>/proc</code></li>
|
||||||
|
<li><strong>Transformer on per-process resource metrics</strong>
|
||||||
|
— related signal, single-host eval</li>
|
||||||
|
<li><strong>BERT on system logs</strong> (LogBERT) —
|
||||||
|
text-form telemetry, not numeric channels</li>
|
||||||
|
<li><strong>Insider-threat LSTM on event logs</strong>
|
||||||
|
(DANTE) — categorical events, not continuous</li>
|
||||||
|
<li><strong>Network-behaviour trust establishment</strong>
|
||||||
|
(IEEE 9881803) — cross-device aggregation,
|
||||||
|
not per-host classifier</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="research-col">
|
||||||
|
<div class="research-col-title">what's missing</div>
|
||||||
|
<ul class="research-list">
|
||||||
|
<li><strong>/proc-only signal</strong> — most work
|
||||||
|
assumes syscalls or kernel hooks</li>
|
||||||
|
<li><strong>Cross-host generalization</strong> — eval
|
||||||
|
splits often hide it (held-out by sample, not host)</li>
|
||||||
|
<li><strong>Real-time per-window classification</strong>
|
||||||
|
for containment, not post-hoc batch labelling</li>
|
||||||
|
<li><strong>Side-by-side cell-choice comparison</strong>
|
||||||
|
(RNN/GRU/LSTM/CNN/Transformer) on one dataset</li>
|
||||||
|
<li><strong>Direct integration</strong> with a
|
||||||
|
fleet-wide trust score, not standalone output</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 5. solution-overview — pipeline block diagram -->
|
||||||
|
<div class="stage-view" data-view="solution-overview">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">pipeline · what each stage produces</div>
|
||||||
|
<svg class="pipeline-svg" viewBox="0 0 800 480"
|
||||||
|
xmlns="http://www.w3.org/2000/svg"
|
||||||
|
preserveAspectRatio="xMidYMid meet">
|
||||||
|
<g class="pipeline-stage">
|
||||||
|
<rect x="20" y="40" width="140" height="60" rx="4"/>
|
||||||
|
<text x="90" y="68" text-anchor="middle">fleet hosts</text>
|
||||||
|
<text x="90" y="86" text-anchor="middle" class="pipeline-detail">/proc · 10 Hz</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage">
|
||||||
|
<rect x="200" y="40" width="140" height="60" rx="4"/>
|
||||||
|
<text x="270" y="68" text-anchor="middle">receiver (Pi)</text>
|
||||||
|
<text x="270" y="86" text-anchor="middle" class="pipeline-detail">bearer auth</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage">
|
||||||
|
<rect x="380" y="40" width="140" height="60" rx="4"/>
|
||||||
|
<text x="450" y="68" text-anchor="middle">episode store</text>
|
||||||
|
<text x="450" y="86" text-anchor="middle" class="pipeline-detail">zstd · tar</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage">
|
||||||
|
<rect x="560" y="40" width="220" height="60" rx="4"/>
|
||||||
|
<text x="670" y="68" text-anchor="middle">windowing + features</text>
|
||||||
|
<text x="670" y="86" text-anchor="middle" class="pipeline-detail">10 s · 100 samples × 12 ch</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage pipeline-stage-models">
|
||||||
|
<rect x="180" y="170" width="440" height="120" rx="4"/>
|
||||||
|
<text x="400" y="198" text-anchor="middle" class="pipeline-stage-title">model zoo</text>
|
||||||
|
<text x="400" y="226" text-anchor="middle" class="pipeline-detail">KNN · GBT · MLP · CNN · RNN · GRU · LSTM · Transformer</text>
|
||||||
|
<text x="400" y="252" text-anchor="middle" class="pipeline-detail">trained per (model × split-recipe)</text>
|
||||||
|
<text x="400" y="276" text-anchor="middle" class="pipeline-detail-mini">cross-host eval · class-weighted CE · early stop on val macro-F1</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage">
|
||||||
|
<rect x="60" y="350" width="200" height="60" rx="4"/>
|
||||||
|
<text x="160" y="378" text-anchor="middle">per-window phase</text>
|
||||||
|
<text x="160" y="396" text-anchor="middle" class="pipeline-detail">5-class softmax</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage pipeline-stage-final">
|
||||||
|
<rect x="300" y="350" width="200" height="60" rx="4"/>
|
||||||
|
<text x="400" y="378" text-anchor="middle">trust score</text>
|
||||||
|
<text x="400" y="396" text-anchor="middle" class="pipeline-detail">+ network signals (9881803)</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-stage pipeline-stage-final">
|
||||||
|
<rect x="540" y="350" width="220" height="60" rx="4"/>
|
||||||
|
<text x="650" y="378" text-anchor="middle">containment + reset</text>
|
||||||
|
<text x="650" y="396" text-anchor="middle" class="pipeline-detail">snapshot rollback</text>
|
||||||
|
</g>
|
||||||
|
<g class="pipeline-arrow" fill="none">
|
||||||
|
<path d="M160 70 L200 70" />
|
||||||
|
<path d="M340 70 L380 70" />
|
||||||
|
<path d="M520 70 L560 70" />
|
||||||
|
<path d="M670 100 L670 130 L400 130 L400 170" />
|
||||||
|
<path d="M400 290 L400 320 L160 320 L160 350" />
|
||||||
|
<path d="M260 380 L300 380" />
|
||||||
|
<path d="M500 380 L540 380" />
|
||||||
|
</g>
|
||||||
|
</svg>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 6. stack — Python stack & libraries used in the project -->
|
||||||
<div class="stage-view" data-view="stack">
|
<div class="stage-view" data-view="stack">
|
||||||
<div class="metric-stack metric-stack-wide">
|
<div class="metric-stack metric-stack-wide">
|
||||||
<div class="metric-eyebrow">the stack behind the live data on the right</div>
|
<div class="metric-eyebrow">the stack behind the live data on the right</div>
|
||||||
|
|
@ -316,7 +452,60 @@
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<!-- 9. models -->
|
<!-- 9. evaluation-setup — splits, metrics, baselines -->
|
||||||
|
<div class="stage-view" data-view="evaluation-setup">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">evaluation setup · how the numbers get made</div>
|
||||||
|
<div class="eval-blocks">
|
||||||
|
<div class="eval-block">
|
||||||
|
<div class="eval-block-title">split recipe</div>
|
||||||
|
<div class="eval-block-body">
|
||||||
|
<div><strong>train ∪ val:</strong> elliott-thinkpad</div>
|
||||||
|
<div><strong>test:</strong> k-gamingcom</div>
|
||||||
|
<div class="eval-detail">held-out by host so the test set
|
||||||
|
measures cross-device generalization, not in-distribution
|
||||||
|
self-prediction. A 90 % accuracy that comes from
|
||||||
|
recognising the host's idle profile is worthless for
|
||||||
|
a fleet detector.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="eval-block">
|
||||||
|
<div class="eval-block-title">primary metric</div>
|
||||||
|
<div class="eval-block-body">
|
||||||
|
<div><strong>macro-F1</strong> averaged across the five phases</div>
|
||||||
|
<div class="eval-detail">accuracy lies under class
|
||||||
|
imbalance — ~50 % <code>infected_running</code>,
|
||||||
|
~5 % <code>armed</code>. A constant majority predictor
|
||||||
|
hits 0.5 accuracy. macro-F1 averages per-class F1,
|
||||||
|
so rare phases actually count toward the score.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="eval-block">
|
||||||
|
<div class="eval-block-title">baselines compared</div>
|
||||||
|
<div class="eval-block-body">
|
||||||
|
<div><strong>KNN</strong> — non-parametric, instance-based</div>
|
||||||
|
<div><strong>GBT (XGBoost)</strong> — tabular non-NN</div>
|
||||||
|
<div><strong>MLP</strong> — feedforward ablation</div>
|
||||||
|
<div><strong>CNN</strong> — local-pattern ablation</div>
|
||||||
|
<div><strong>RNN / GRU / LSTM</strong> — recurrent family</div>
|
||||||
|
<div><strong>Transformer</strong> — attention</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="eval-block">
|
||||||
|
<div class="eval-block-title">reported alongside accuracy</div>
|
||||||
|
<div class="eval-block-body">
|
||||||
|
<div><strong>μs / window</strong> — inference cost at batch=64</div>
|
||||||
|
<div><strong>cross-host gap</strong> — val − test macro-F1</div>
|
||||||
|
<div class="eval-detail">latency translates to containment
|
||||||
|
lag; the gap is the honest measure of generalization.
|
||||||
|
Both are plotted on the perf scene.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 10. models -->
|
||||||
<div class="stage-view" data-view="models">
|
<div class="stage-view" data-view="models">
|
||||||
<div class="metric-stack">
|
<div class="metric-stack">
|
||||||
<div class="metric-eyebrow">sequence models · accuracy on held-out samples</div>
|
<div class="metric-eyebrow">sequence models · accuracy on held-out samples</div>
|
||||||
|
|
@ -386,7 +575,7 @@
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<!-- 13. live — fleet-wide live detections feed -->
|
<!-- 14. live — fleet-wide live detections feed -->
|
||||||
<div class="stage-view" data-view="live">
|
<div class="stage-view" data-view="live">
|
||||||
<div class="metric-stack metric-stack-wide live-stack">
|
<div class="metric-stack metric-stack-wide live-stack">
|
||||||
<div class="live-stats">
|
<div class="live-stats">
|
||||||
|
|
@ -403,6 +592,228 @@
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
<!-- 15. theoretical-contributions -->
|
||||||
|
<div class="stage-view" data-view="theoretical">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">theoretical contributions · what's new methodologically</div>
|
||||||
|
<div class="motivation-cards">
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-trust"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">window-centre labelling</div>
|
||||||
|
<div class="motivation-card-text">A 10-second
|
||||||
|
classification window is labelled by the phase that
|
||||||
|
occupies its centre, not by majority vote across the
|
||||||
|
window. Cleaner training signal at phase boundaries,
|
||||||
|
and avoids the spurious "ambiguous" class.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-contain"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">schema-hashed checkpoints</div>
|
||||||
|
<div class="motivation-card-text">Each checkpoint
|
||||||
|
embeds a hash of the feature schema; loading a model
|
||||||
|
against the wrong schema fails fast instead of
|
||||||
|
silently scoring on misaligned columns. Makes
|
||||||
|
retroactive comparison reproducible.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-recover"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">cross-host as the eval axis</div>
|
||||||
|
<div class="motivation-card-text">Held-out-by-host
|
||||||
|
is reported as a first-class number alongside
|
||||||
|
held-out-by-sample. The two often disagree by 0.4
|
||||||
|
macro-F1, and only the cross-host number predicts
|
||||||
|
fleet behaviour.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 16. practical-contributions -->
|
||||||
|
<div class="stage-view" data-view="practical">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">practical contributions · what others can use</div>
|
||||||
|
<div class="motivation-cards">
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-trust"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">/proc-only deployment</div>
|
||||||
|
<div class="motivation-card-text">No syscall hooks, no
|
||||||
|
eBPF, no kernel module — runs on hosts that don't
|
||||||
|
permit deep instrumentation. The detector is one
|
||||||
|
Python service plus a model file.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-contain"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">producer-agnostic dashboard</div>
|
||||||
|
<div class="motivation-card-text">The deck consumes
|
||||||
|
typed events; the inference loop runs anywhere
|
||||||
|
(Pi, A100, cloud) and just POSTs back. Same UI for
|
||||||
|
a lab demo and an operational console.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-recover"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">labelled dataset on disk</div>
|
||||||
|
<div class="motivation-card-text">78,000+ episodes,
|
||||||
|
five phases, two hosts, six attack profiles —
|
||||||
|
archived in zstd-compressed tarballs with a
|
||||||
|
schema-versioned format. Ready for downstream
|
||||||
|
work without re-running the orchestrator.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 17. design-principles -->
|
||||||
|
<div class="stage-view" data-view="design-principles">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">design principles · patterns that emerged</div>
|
||||||
|
<div class="motivation-cards">
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-trust"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">one loop, many models</div>
|
||||||
|
<div class="motivation-card-text">Every NN architecture
|
||||||
|
plugs into the same training loop — class weights,
|
||||||
|
AMP, cosine LR, early stop. Architecture changes
|
||||||
|
don't ripple into orchestration.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-contain"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">typed events as contract</div>
|
||||||
|
<div class="motivation-card-text">Producers and
|
||||||
|
consumers agree on dataclasses
|
||||||
|
(<code>events.py</code>), not free-form dicts.
|
||||||
|
Adding a new scene means adding a new dataclass;
|
||||||
|
adding a new producer means importing it.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-recover"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">two-agent path ownership</div>
|
||||||
|
<div class="motivation-card-text">Dashboard work and
|
||||||
|
model work live in two parallel sessions with a
|
||||||
|
documented path-ownership boundary. Merges go
|
||||||
|
through git with explicit rebases instead of a
|
||||||
|
shared workspace.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 18. limitations -->
|
||||||
|
<div class="stage-view" data-view="limitations">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">limitations · the honest list</div>
|
||||||
|
<div class="motivation-cards">
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-armed"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">two-host fleet</div>
|
||||||
|
<div class="motivation-card-text">Cross-host generalization
|
||||||
|
is reported between exactly two machines
|
||||||
|
(elliott-thinkpad → k-gamingcom). N-host claims need
|
||||||
|
more hosts on the WireGuard mesh.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-armed"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">synthetic attack profiles</div>
|
||||||
|
<div class="motivation-card-text">Six profiles cover the
|
||||||
|
main shapes (cpu-saturate, ransomware-lite, bursty-c2,
|
||||||
|
fork-bomb, crypto-miner, distccd-exec) but real-world
|
||||||
|
malware can sit between or outside these envelopes.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-armed"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">10 Hz sampling floor</div>
|
||||||
|
<div class="motivation-card-text">Sub-100ms attack
|
||||||
|
behaviours fall inside a single sample. Detection of
|
||||||
|
extremely short-lived attacks (millisecond-scale
|
||||||
|
privilege checks) requires faster sampling than
|
||||||
|
<code>/proc</code> currently provides.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="motivation-card">
|
||||||
|
<div class="motivation-card-marker mc-armed"></div>
|
||||||
|
<div class="motivation-card-body">
|
||||||
|
<div class="motivation-card-title">KNN cross-host gap</div>
|
||||||
|
<div class="motivation-card-text">KNN scores val
|
||||||
|
macro-F1 ≈ 0.74 on elliott-thinkpad but only 0.13 on
|
||||||
|
the held-out k-gamingcom. Instance-based memorization
|
||||||
|
of the training host's feature space — informative
|
||||||
|
as a baseline, but not a deployment candidate.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- 19. conclusion-future — summary + unsupervised next steps -->
|
||||||
|
<div class="stage-view" data-view="conclusion-future">
|
||||||
|
<div class="metric-stack metric-stack-wide">
|
||||||
|
<div class="metric-eyebrow">conclusion + future work</div>
|
||||||
|
<div class="conclusion-grid">
|
||||||
|
<div class="conclusion-col">
|
||||||
|
<div class="conclusion-col-title">what we showed</div>
|
||||||
|
<ul class="conclusion-list">
|
||||||
|
<li>A per-host detector trained on
|
||||||
|
<strong>/proc-only telemetry</strong> can classify
|
||||||
|
workload phases at multi-class macro-F1 well above
|
||||||
|
chance.</li>
|
||||||
|
<li>Held-out-<strong>by-host</strong> evaluation is the
|
||||||
|
right generalization axis; held-out-by-sample
|
||||||
|
overstates real fleet performance by 0.3+ F1.</li>
|
||||||
|
<li>The recurrent family (LSTM/GRU) and Transformer
|
||||||
|
sit on the upper-left of the
|
||||||
|
<strong>accuracy-vs-cost frontier</strong>; KNN and
|
||||||
|
GBT round out the comparison as honest baselines.</li>
|
||||||
|
<li>The detector slots into a wider <strong>trust /
|
||||||
|
containment / recovery</strong> loop — the per-host
|
||||||
|
verdict isn't the final answer, it's one input.</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="conclusion-col">
|
||||||
|
<div class="conclusion-col-title">next steps · unsupervised</div>
|
||||||
|
<ul class="conclusion-list">
|
||||||
|
<li><strong>Clustering</strong> the unlabeled tail of
|
||||||
|
new fleet data (KMeans / HDBSCAN) to surface novel
|
||||||
|
workload shapes the supervised model has no class
|
||||||
|
for — a self-training feedback loop.</li>
|
||||||
|
<li><strong>Anomaly detection</strong> on the
|
||||||
|
last-layer embedding (one-class SVM, isolation forest)
|
||||||
|
so a "none of the five known phases" verdict is
|
||||||
|
available alongside the classifier output.</li>
|
||||||
|
<li><strong>Self-supervised pretraining</strong> on
|
||||||
|
the much larger pool of unlabeled telemetry from
|
||||||
|
operational hosts; supervised fine-tune on the
|
||||||
|
smaller orchestrated dataset.</li>
|
||||||
|
<li><strong>Embedding visualisation</strong> via
|
||||||
|
UMAP / t-SNE for human-in-the-loop labelling of
|
||||||
|
the unlabeled tail (already prototyped in scene 12).</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
<button id="next-fab" class="fab" data-no-advance title="Next (→)">▼</button>
|
<button id="next-fab" class="fab" data-no-advance title="Next (→)">▼</button>
|
||||||
</div>
|
</div>
|
||||||
|
|
@ -453,6 +864,79 @@
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="problem-statement">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Problem statement</h2>
|
||||||
|
<p>Today's behaviour-based IDS systems rely on syscall traces,
|
||||||
|
kernel hooks, or rich endpoint agents that can't ship to
|
||||||
|
constrained or untrusted hosts. We want a detector that
|
||||||
|
runs on the only telemetry every modern Linux already
|
||||||
|
exports — <code>/proc</code> — and labels each ten-second
|
||||||
|
window of activity with the phase the workload is in.</p>
|
||||||
|
<p><strong>Research question.</strong> Can a sequence model
|
||||||
|
trained on twelve channels of <code>/proc</code> telemetry
|
||||||
|
classify five workload phases (clean / armed / infecting /
|
||||||
|
infected_running / dormant) accurately enough to drive
|
||||||
|
automated containment, <em>and</em> generalize across hosts
|
||||||
|
and malware profiles it has never seen during training?</p>
|
||||||
|
<p>The task is <strong>multi-class classification</strong>:
|
||||||
|
the target is one of five mutually-exclusive phase labels.
|
||||||
|
Not regression (no continuous target), not ranking
|
||||||
|
(downstream policy is a categorical containment decision).
|
||||||
|
We deliberately chose 10-second windows so detection
|
||||||
|
latency stays bounded for a real fleet.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="research-questions">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Research gaps + questions</h2>
|
||||||
|
<p>Literature on behaviour-based malware detection is rich but
|
||||||
|
uneven. Most published results either (a) use richer
|
||||||
|
telemetry than what a constrained host actually exports, or
|
||||||
|
(b) frame evaluation in ways that hide the cross-host
|
||||||
|
generalization problem. The card on the left summarises the
|
||||||
|
gap.</p>
|
||||||
|
<p>This project asks three concrete questions:</p>
|
||||||
|
<p><strong>RQ1.</strong> How well can a per-window classifier
|
||||||
|
identify workload phases from <code>/proc</code> alone, with
|
||||||
|
no syscall traces and no kernel hooks?</p>
|
||||||
|
<p><strong>RQ2.</strong> Does the model still work when test
|
||||||
|
episodes come from a host the training set never saw?</p>
|
||||||
|
<p><strong>RQ3.</strong> Of the standard sequence-model
|
||||||
|
families (RNN, GRU, LSTM, CNN, Transformer) plus a
|
||||||
|
non-parametric baseline (KNN) and a tabular baseline
|
||||||
|
(gradient-boosted trees), which trade off accuracy and
|
||||||
|
inference cost best for a deployment that has to run on a
|
||||||
|
constrained host?</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="solution-overview">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Proposed solution</h2>
|
||||||
|
<p>A single end-to-end pipeline turns raw <code>/proc</code>
|
||||||
|
telemetry on a fleet host into a per-window phase verdict
|
||||||
|
in under a second. Each stage of the diagram on the left
|
||||||
|
is a thin, independently-deployable component — the
|
||||||
|
receiver doesn't know what model is running; the model
|
||||||
|
doesn't know where the episode came from.</p>
|
||||||
|
<p>The <strong>model zoo</strong> is the key abstraction:
|
||||||
|
every model class registers itself by name, declares its
|
||||||
|
input kind (summary features or window tensors), and plugs
|
||||||
|
into one shared training loop. KNN, GBT, MLP, CNN, RNN,
|
||||||
|
GRU, LSTM, and Transformer all reuse the same standardization,
|
||||||
|
schema-hashed checkpoint format, class-weighted CE loss,
|
||||||
|
and held-out-by-host evaluation — so the comparison is
|
||||||
|
genuinely apples-to-apples.</p>
|
||||||
|
<p>The detector's per-window verdict feeds two downstream
|
||||||
|
loops: a fleet-wide <strong>trust score</strong> that
|
||||||
|
combines local classification with network-behaviour
|
||||||
|
signals (per IEEE 9881803), and a <strong>fast-recovery</strong>
|
||||||
|
snapshot rollback when an infection time is known.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section class="scene" data-stage="stack">
|
<section class="scene" data-stage="stack">
|
||||||
<div class="prose">
|
<div class="prose">
|
||||||
<h2>Live, not staged</h2>
|
<h2>Live, not staged</h2>
|
||||||
|
|
@ -553,6 +1037,31 @@
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="evaluation-setup">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Evaluation setup</h2>
|
||||||
|
<p>Three choices anchor every result on the next slides — the
|
||||||
|
split recipe, the primary metric, and what we measure next
|
||||||
|
to accuracy. The temptation is to report a single big
|
||||||
|
number; we report a number you can argue with.</p>
|
||||||
|
<p><strong>Held-out by host.</strong> Train and validate on
|
||||||
|
one machine; test on a different machine. A model that
|
||||||
|
wins by memorising the train host's idle profile loses
|
||||||
|
here, which is what you want — a fleet detector has to
|
||||||
|
generalize across hosts it never saw at training time.</p>
|
||||||
|
<p><strong>Macro-F1, not accuracy.</strong> The dataset is
|
||||||
|
heavily skewed: roughly half the labelled time is
|
||||||
|
<code>infected_running</code> and only ~5 % is
|
||||||
|
<code>armed</code>. A "predict the majority class"
|
||||||
|
baseline already hits 0.5 accuracy. Macro-F1 averages F1
|
||||||
|
across all five phases so rare classes count.</p>
|
||||||
|
<p><strong>Latency reported with accuracy.</strong> A model
|
||||||
|
that's one F1 point better but ten milliseconds slower
|
||||||
|
may still be the wrong choice for an on-host detector.
|
||||||
|
The perf scene plots both axes so the trade-off is visible.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section class="scene" data-stage="models">
|
<section class="scene" data-stage="models">
|
||||||
<div class="prose">
|
<div class="prose">
|
||||||
<h2>Sequence models</h2>
|
<h2>Sequence models</h2>
|
||||||
|
|
@ -632,6 +1141,141 @@
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="theoretical">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Theoretical contributions</h2>
|
||||||
|
<p>Three methodological claims this project makes — small in
|
||||||
|
isolation, but together they change how the comparison is
|
||||||
|
run. Each shows up explicitly in the codebase.</p>
|
||||||
|
<p><strong>Window-centre labelling.</strong> Instead of
|
||||||
|
majority-voting phase labels across each 10-second window
|
||||||
|
(which creates noisy boundaries), we label each window by
|
||||||
|
the phase that occupies its centre. Cleaner training
|
||||||
|
signal at transitions, no spurious "ambiguous" class.</p>
|
||||||
|
<p><strong>Schema-hashed checkpoints.</strong> Every
|
||||||
|
checkpoint embeds a hash of the feature schema it was
|
||||||
|
trained on. Loading a model against a different schema
|
||||||
|
fails fast. Without this, retroactive comparison silently
|
||||||
|
scores models on misaligned columns and reports nonsense.</p>
|
||||||
|
<p><strong>Cross-host as the eval axis.</strong>
|
||||||
|
Held-out-by-host is reported as a first-class number
|
||||||
|
alongside held-out-by-sample — the two often disagree by
|
||||||
|
~0.4 macro-F1, and only the cross-host number predicts
|
||||||
|
real fleet behaviour.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="practical">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Practical contributions</h2>
|
||||||
|
<p>What others can pick up and use from this project — beyond
|
||||||
|
the published numbers.</p>
|
||||||
|
<p><strong>/proc-only deployment.</strong> The detector needs
|
||||||
|
no syscall hooks, no eBPF, no kernel module. It runs on
|
||||||
|
hosts that don't permit deeper instrumentation — a small
|
||||||
|
VM, a container with limited capabilities, an embedded
|
||||||
|
device. One Python service plus a model file.</p>
|
||||||
|
<p><strong>Producer-agnostic dashboard.</strong> The deck
|
||||||
|
consumes typed events
|
||||||
|
(<code>training/dashboard/events.py</code>); the inference
|
||||||
|
loop runs anywhere — Pi, A100, cloud — and just POSTs back.
|
||||||
|
Same UI for a lab demo and an operational console.</p>
|
||||||
|
<p><strong>Labelled dataset on disk.</strong> 78 000+
|
||||||
|
episodes across two hosts and six attack profiles, archived
|
||||||
|
in zstd-compressed tarballs with a schema-versioned format.
|
||||||
|
Anyone reproducing or extending this work can start from
|
||||||
|
the dataset directly without re-running the orchestrator.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="design-principles">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Design principles</h2>
|
||||||
|
<p>Three patterns that emerged during the project and earned
|
||||||
|
their keep enough that we'd repeat them.</p>
|
||||||
|
<p><strong>One loop, many models.</strong> Every NN
|
||||||
|
architecture plugs into the same training loop — class
|
||||||
|
weights, AMP autocast, cosine LR with warmup, gradient
|
||||||
|
clipping, early stop on val macro-F1. Architecture changes
|
||||||
|
don't ripple into orchestration, and adding a new model
|
||||||
|
class costs ~80 lines.</p>
|
||||||
|
<p><strong>Typed events as contract.</strong> Producers and
|
||||||
|
consumers agree on dataclasses, not free-form dicts.
|
||||||
|
Adding a new dashboard scene means adding a new dataclass;
|
||||||
|
adding a new producer means importing it. Static checking
|
||||||
|
and editor autocomplete do most of the work that a
|
||||||
|
schema-validation library would do at runtime.</p>
|
||||||
|
<p><strong>Two-agent path ownership.</strong> Dashboard work
|
||||||
|
and model work live in two parallel sessions with a
|
||||||
|
documented path-ownership boundary
|
||||||
|
(<code>training/dashboard/</code> vs everywhere else).
|
||||||
|
Merges go through git with explicit rebases instead of a
|
||||||
|
shared workspace — slow up front, fewer subtle stomps
|
||||||
|
over time.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="limitations">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Limitations</h2>
|
||||||
|
<p>What this project cannot honestly claim — and why each
|
||||||
|
line on the left matters for how the results should be read.</p>
|
||||||
|
<p><strong>Two-host fleet.</strong> Cross-host generalization
|
||||||
|
is reported between exactly two machines; it's the right
|
||||||
|
<em>shape</em> of evaluation but not a population claim.
|
||||||
|
More hosts on the WireGuard mesh would let us report
|
||||||
|
distributional bounds rather than single point comparisons.</p>
|
||||||
|
<p><strong>Synthetic attack profiles.</strong> Our six
|
||||||
|
profiles cover the main behavioural envelopes
|
||||||
|
(cpu-saturate, ransomware-lite, bursty-c2, fork-bomb,
|
||||||
|
crypto-miner, distccd-exec) but real-world malware can
|
||||||
|
sit between or outside these envelopes. Generalization to
|
||||||
|
unseen profiles is reported via held-out-by-sample, but
|
||||||
|
in-the-wild distribution shift is unknown.</p>
|
||||||
|
<p><strong>10 Hz sampling floor.</strong> Sub-100ms
|
||||||
|
behaviours fall inside a single sample. Detection of
|
||||||
|
millisecond-scale privilege checks would need faster
|
||||||
|
telemetry than <code>/proc</code> provides.</p>
|
||||||
|
<p><strong>KNN cross-host gap.</strong> KNN scores val
|
||||||
|
macro-F1 ≈ 0.74 on the train host but only ≈ 0.13 on the
|
||||||
|
held-out one. Instance-based memorization of the training
|
||||||
|
host's feature space — informative as a baseline, not a
|
||||||
|
deployment candidate.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="scene" data-stage="conclusion-future">
|
||||||
|
<div class="prose">
|
||||||
|
<h2>Conclusion + future work</h2>
|
||||||
|
<p>A per-host classifier trained on <code>/proc</code>-only
|
||||||
|
telemetry can identify workload phases at multi-class
|
||||||
|
macro-F1 well above chance and slot into a wider
|
||||||
|
trust / containment / recovery loop. The recurrent family
|
||||||
|
(LSTM/GRU) and Transformer sit on the upper-left of the
|
||||||
|
accuracy-vs-cost frontier; KNN and GBT are honest baselines.
|
||||||
|
Held-out-by-host evaluation is the right generalization
|
||||||
|
axis — held-out-by-sample overstates real fleet
|
||||||
|
performance by 0.3+ F1.</p>
|
||||||
|
<p><strong>Unsupervised next steps.</strong> The natural
|
||||||
|
extensions are unsupervised:</p>
|
||||||
|
<p>• <strong>Clustering</strong> the unlabeled tail of new
|
||||||
|
fleet data (KMeans / HDBSCAN) to surface novel workload
|
||||||
|
shapes the supervised model has no class for — a
|
||||||
|
self-training feedback loop that enrolls new phases as
|
||||||
|
the fleet grows.</p>
|
||||||
|
<p>• <strong>Anomaly detection</strong> on the last-layer
|
||||||
|
embedding (one-class SVM, isolation forest) so a "none of
|
||||||
|
the five known phases" verdict is available alongside the
|
||||||
|
classifier output.</p>
|
||||||
|
<p>• <strong>Self-supervised pretraining</strong> on the much
|
||||||
|
larger pool of unlabeled telemetry from operational hosts;
|
||||||
|
supervised fine-tune on the smaller orchestrated dataset.</p>
|
||||||
|
<p>• <strong>Embedding visualisation</strong> via UMAP /
|
||||||
|
t-SNE for human-in-the-loop labelling — already prototyped
|
||||||
|
in the KNN scene's interactive 3-D scatter.</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section class="scene" data-stage="references">
|
<section class="scene" data-stage="references">
|
||||||
<div class="prose">
|
<div class="prose">
|
||||||
<h2>References</h2>
|
<h2>References</h2>
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue