# Strongest published precedent for this exact setup

This paper applies **transformer architectures to per-process
resource-utilisation metrics** — the same shape of telemetry we
collect from `/proc`. Closest reference to "the project we're doing,
but already published."

## What we borrowed

- **Channel selection.** Their list of `/proc` channels overlaps
  heavily with ours (`cpu_user_jiffies`, `cpu_sys_jiffies`,
  `rss_bytes`, `io_*_bytes`, `voluntary_ctxsw`, `involuntary_ctxsw`,
  page-fault counters). Our 12-channel selection is essentially
  this set, validated.
- **Window-and-classify framing.** They confirm that a transformer
  reading short windows of these counters beats per-window
  hand-features fed to gradient-boosted trees. That is exactly the
  comparison we run: KNN-on-features vs sequence-models-on-windows.
- **Held-out-sample evaluation.** They emphasise generalising to
  *unseen* malware families, not unseen time-slices of the same
  family. We adopt the same eval protocol on the perf scene.

## Where it differs

- They use a much larger corpus and run on commercial endpoints;
  we run on three lab hosts and a Pi. Their numbers are an upper
  bound on what we can hope to reproduce — they're the target, not
  the floor.
- They don't publish their exact dataset, so the comparison is
  architectural, not reproductive.