CIS490/docs/threat-model.md

# Threat Model & Train/Serve Parity

The single most important design rule in this project:

> **A feature used by the deployed model must exist on the deployed device.**

Violating this rule produces a model that scores 99% in the lab and is useless in
the field. This document spells out which features fall on which side of that
line, and why we still bother capturing both.

## The setting

The deployed model runs on a real, non-virtualized device — typically a
constrained Linux endpoint (server, IoT box, edge node — the specific form
factor doesn't matter, only that it has its own kernel and isn't running
under our hypervisor). It tries to detect the moment that device gets
breached.

> **Not the Pi5.** In our project topology, the Pi5 is the WireGuard-side
> *collector* that receives episode tarballs from lab hosts. It is *not* the
> deployment target for the model. Don't conflate the two roles.

Two adversarial facts shape the design:

1. **Malware can lie to in-device tools.** A sufficiently-privileged rootkit can
   hook `/proc`, intercept `perf_event_open`, and hide its own processes.
2. **There is no host-side QEMU view.** The deployed device is the actual
   machine. Nothing is watching it from outside *the OS itself*.

So the model has two trustworthy floors:

- **In-device features that survive most malware** (perf counters via the syscall
  interface, thermals, gross resource counters) — fast to lie to in principle,
  but in practice most commodity malware doesn't bother.
- **Off-device features at the gateway** (network telemetry observed by an
  upstream router/gateway) — physics-bound, the malware cannot prevent bytes
  from leaving the NIC.

## Two roles: features vs. oracles

Every measurement we capture in the lab gets one of two roles:

| Role | What it's used for | Available in deployment? |
|---|---|---|
| **Feature** | Input to the trained model | **Must be yes** |
| **Oracle** | Ground-truth labeling during training only | No — but we have it in the lab |

The oracle channels (host `/proc/<qemu_pid>`, QMP `query-stats`,
`perf -p qemu_pid`) are how we know with certainty what the guest is *actually*
doing — not what it claims to be doing. We use that certainty to assign correct
labels in `labels.jsonl`. The model never sees them at training time.

## Channel taxonomy

| # | Channel | Vantage | Role | Why |
|---|---|---|---|---|
| 1 | Host `/proc/<qemu_pid>` | outside guest | oracle | doesn't exist on real device |
| 2 | QEMU QMP `query-stats`, `query-blockstats` | outside guest | oracle | same |
| 3 | `perf stat -p <qemu_pid>` | outside guest | oracle | same |
| 4 | Bridge-side pcap (`tcpdump -i br-malware`) | gateway | **feature** | matches Pi5 gateway in field |
| 5 | In-guest `/proc/*`, `perf_event_open`, `/sys/class/thermal/*` | inside guest | **feature** | same exact source on real device |

Note: in-guest features (5) are the same syscall surfaces we'd read on a real
Pi. The data we capture from them in the lab and the data we capture from them
on a Pi at deployment are pulled from identical kernel APIs — that is what makes
parity hold.

## The two-model evaluation strategy

We will train two classifiers from the same dataset:

1. **Realistic model** — trained only on features (channels 4 + 5).
   *This is what would be deployed.*
2. **Oracle model** — trained on everything (channels 1–5).
   *This is the upper bound on what was learnable from this dataset.*

The interesting numbers are:

- The realistic model's accuracy → **how well the deployable system actually works**.
- The gap between realistic and oracle → **how much detection power a privileged
  rootkit can take away from us by lying** (the cost of in-device measurement
  being adversarial).
- Realistic-without-pcap vs. realistic-with-pcap → **how much the gateway
  observer is worth** (the value of off-device telemetry).

Reporting all three is the paper-worthy result — not just one accuracy number.

## Using oracle disagreement as a label

When the in-guest agent reports CPU at 8% and the host says the QEMU process is
burning 70%, that *is* the rootkit-active signal — and it's more reliable than
any single in-guest measurement. Episodes where this divergence appears get a
high-confidence `infected_running` label even if the in-guest data alone looks
ambiguous. This is the practical payoff of capturing both sides.

## What we are not claiming

- We are not claiming to detect kernel rootkits robustly from in-guest data alone.
  The oracle/feature gap will quantify the limit.
- We are not claiming the trained model is safe to deploy without the gateway
  observer in production — for the strongest threat model, gateway-side fusion
  is required.