← all workloads

minimal

Near-empty shader: isolates the fixed per-compile floor — core-module deserialization (readSerializedModuleIR/AST, loadBuiltinModule) plus linkIR of the user module against the core module — with negligible user-code work. This is the cleanest detector for "the standard library got heavier" regressions: when the core module grows, every compile pays here regardless of which language features it uses (the class of regression introduced by the auto-diff overhaul in PR #9808, where an empty shader's linkIR rose ~27x).

bucket: core_link  ·  compile mode: target  ·  flags: -target spirv -emit-spirv-directly  ·  default N: 0

Phase composition across releases

Full sub-counter decomposition of compileInner — named leaf timers plus (self) residuals (a parent's time not covered by a named child, e.g. the autodiff transform in linkAndOptimizeIR (self)). Topmost band traces compileInner; hover a band for its phase.

minimal — full phase breakdown across releases (median ms) minimal 3.03× 0.0 13 27 daily → 25.14 25.15 25.16 25.17 25.18 25.19 25.20 25.21 25.22 25.23 25.24 26.1 26.2 26.3 26.4 26.5 26.7 26.8 26.9 26.10 26.11 06-25 06-26 minimal — parseTranslationUnit minimal — SemanticChecking minimal — generateIR minimal — frontEndExecute (self) minimal — specializeModule minimal — simplifyIR minimal — linkIR minimal — unrollLoopsInModule minimal — legalizeResourceTypes minimal — legalizeExistentialTypeLayout minimal — performMandatoryEarlyInlining minimal — performForceInlining minimal — linkAndOptimizeIR (self) minimal — generateOutput (self) minimal — compileInner (self) phase buckets parseTranslationUnit SemanticChecking generateIR frontEndExecute (self) specializeModule simplifyIR linkIR unrollLoopsInModule legalizeResourceTypes legalizeExistentialTypeLayout performMandatoryEarlyInlining performForceInlining linkAndOptimizeIR (self) emitEntryPointsSourceFromIR generateOutput (self) compileInner (self)

Compiled Slang source

exact compiled source; long files show the first 40 lines, the area around computeMain (±40), and the last 40 lines (gaps elided)

minimal.slang

// AUTO-GENERATED by perf-suite/workloads.py — do not edit by hand.
RWStructuredBuffer<float> outBuf;

[shader("compute")]
[numthreads(1,1,1)]
void computeMain(uint3 tid : SV_DispatchThreadID) { outBuf[tid.x] = float(tid.x); }