# Strongest published precedent for this exact setup This paper applies **transformer architectures to per-process resource-utilisation metrics** — the same shape of telemetry we collect from `/proc`. Closest reference to "the project we're doing, but already published." ## What we borrowed - **Channel selection.** Their list of `/proc` channels overlaps heavily with ours (`cpu_user_jiffies`, `cpu_sys_jiffies`, `rss_bytes`, `io_*_bytes`, `voluntary_ctxsw`, `involuntary_ctxsw`, page-fault counters). Our 12-channel selection is essentially this set, validated. - **Window-and-classify framing.** They confirm that a transformer reading short windows of these counters beats per-window hand-features fed to gradient-boosted trees. That is exactly the comparison we run: KNN-on-features vs sequence-models-on-windows. - **Held-out-sample evaluation.** They emphasise generalising to *unseen* malware families, not unseen time-slices of the same family. We adopt the same eval protocol on the perf scene. ## Where it differs - They use a much larger corpus and run on commercial endpoints; we run on three lab hosts and a Pi. Their numbers are an upper bound on what we can hope to reproduce — they're the target, not the floor. - They don't publish their exact dataset, so the comparison is architectural, not reproductive.