Near-empty shader: isolates the fixed per-compile floor — core-module deserialization (readSerializedModuleIR/AST, loadBuiltinModule) plus linkIR of the user module against the core module — with negligible user-code work. This is the cleanest detector for "the standard library got heavier" regressions: when the core module grows, every compile pays here regardless of which language features it uses (the class of regression introduced by the auto-diff overhaul in PR #9808, where an empty shader's linkIR rose ~27x).
bucket: core_link · compile mode: target · flags: -target spirv -emit-spirv-directly · default N: 0
Full sub-counter decomposition of compileInner — named leaf timers plus (self) residuals (a parent's time not covered by a named child, e.g. the autodiff transform in linkAndOptimizeIR (self)). Topmost band traces compileInner; hover a band for its phase.
exact compiled source; long files show the first 40 lines, the area around computeMain (±40), and the last 40 lines (gaps elided)
// AUTO-GENERATED by perf-suite/workloads.py — do not edit by hand.
RWStructuredBuffer<float> outBuf;
[shader("compute")]
[numthreads(1,1,1)]
void computeMain(uint3 tid : SV_DispatchThreadID) { outBuf[tid.x] = float(tid.x); }