Tier-2 workload-silent false positive: pgrep -c unsupported on BusyBox, disown missing — workload IS running #15
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
Pi-side classifier labels 244 episodes from
elliott-thinkpadandk-gamingcomasworkload-silent. In-guest loadavg peaks at ~0.77 for cpu-saturate, andtop_procsintelemetry-guest.jsonlnever shows ayesprocess.Diagnostic environment
launch_demo.sh, probed withSerialClientfrom dev cloneStep 2 — baseline guest state (clean boot, no workload)
Note:
disowndoes NOT appear — it is not a BusyBox applet or shell builtin on this Alpine guest.Step 3 — exact start_cmd and pgrep chain test
TEST_BEGIN
start_cmd repr (exact output of
_cpu_saturate().start_cmd):run(start_cmd) output:
Probe 3 seconds after start_cmd:
ps at same moment:
pid file contents:
2300script file contents:
pgrep -c yes (as used in _probe()):
pgrep -l yes (working alternative):
2301 yes 2302 yes 2339 yes exit=0pgrep yes (no flags):
2301 2302 2339 exit=0Manual /proc count:
/proc/2301/status /proc/2302/status /proc/2339/statusTEST_END
Observation
(d) Something else — the heredoc creates the file correctly (86 bytes, correct content);
nohup shkeeps the script alive (confirmed by ps);yesIS saturating the vCPU (loadavg=1.05, three yes PIDs visible). The workload is NOT silent.The false-positive
workload-silentlabel is caused by two bugs inVMLoadController._probe():Bug 1 (root cause):
pgrep -cis not a valid BusyBox flagBusyBox v1.37.0
pgrepsupports[-flanovx]only. The-c(count) flag is a procps/util-linux extension. Whenpgrep -c yesis called:2>/dev/nullsuppresses the usage output|| echo 0fires → always producesyes=0So
echo yes=$(pgrep -c yes 2>/dev/null || echo 0)always printsyes=0, even whenyesis saturating the vCPU. This false zero is what the Pi classifier reads fromworkload_killedevents.Fix: replace
pgrep -c yeswithpgrep yes | wc -l(both supported by BusyBox).Bug 2 (secondary):
disownis not a BusyBox builtin-sh: disown: not foundis printed on everyinfected_runningentry. The background process survives (protected bynohup), so this is currently harmless, but the error leaks intorun()'s captured output and could confuse future callers.Fix: remove
disownfrom_wrap_loop;nohupalready provides SIGHUP immunity.Impact
All 244 episodes from
elliott-thinkpadandk-gamingcomare correctly labeled by phase but incorrectly taggedworkload-silent. The CPU envelope IS present in the host-side/proctelemetry (qemu-system CPU%). The in-guesttop_procsgap may be a separate agent-side pgrep issue using the same-cflag. The episodes are not wasted — host-side telemetry is valid — but theworkload_silentfilter would incorrectly exclude them from the ML pipeline.Excellent diagnostic. Both bugs fixed in
2707709on main:For the 244 episodes already on disk: rather than re-collect, the prune classifier now cross-checks the in-guest probe against host-side /proc CPU envelope. workload-silent flags only when both agree. Added test_workload_silent_suppressed_when_host_cpu_real as a regression. AGENTS.md grew a 'Don't trust the in-guest probe alone' section with the busybox-vs-procps gotcha and a list of patterns to avoid.
Net result: existing data is rescued, the probe is correct going forward.