← all workloads

loop_unroll

A force-unrolled loop of `n` iterations. Stresses loop unrolling (unrollLoopsInModule) and the downstream SSA simplify on the unrolled body.

bucket: loop_unroll  ·  compile mode: target  ·  flags: -target spirv -emit-spirv-directly  ·  default N: 300

Phase composition across releases

Full sub-counter decomposition of compileInner — named leaf timers plus (self) residuals (a parent's time not covered by a named child, e.g. the autodiff transform in linkAndOptimizeIR (self)). Topmost band traces compileInner; hover a band for its phase.

loop_unroll — full phase breakdown across releases (median ms) loop_unroll 0.55× 0.0 497 993 daily → 25.14 25.15 25.16 25.17 25.18 25.19 25.20 25.21 25.22 25.23 25.24 26.1 26.2 26.3 26.4 26.5 26.7 26.8 26.9 26.10 26.11 06-25 06-26 loop_unroll — parseTranslationUnit loop_unroll — SemanticChecking loop_unroll — generateIR loop_unroll — frontEndExecute (self) loop_unroll — specializeModule loop_unroll — simplifyIR loop_unroll — linkIR loop_unroll — unrollLoopsInModule loop_unroll — legalizeResourceTypes loop_unroll — legalizeExistentialTypeLayout loop_unroll — performMandatoryEarlyInlining loop_unroll — performForceInlining loop_unroll — linkAndOptimizeIR (self) loop_unroll — generateOutput (self) loop_unroll — compileInner (self) phase buckets parseTranslationUnit SemanticChecking generateIR frontEndExecute (self) specializeModule simplifyIR linkIR unrollLoopsInModule legalizeResourceTypes legalizeExistentialTypeLayout performMandatoryEarlyInlining performForceInlining linkAndOptimizeIR (self) emitEntryPointsSourceFromIR generateOutput (self) compileInner (self)

Compiled Slang source

exact compiled source (N = 300); long files show the first 40 lines, the area around computeMain (±40), and the last 40 lines (gaps elided)

loop_unroll.slang

// AUTO-GENERATED by perf-suite/workloads.py — do not edit by hand.
RWStructuredBuffer<float> outBuf;

[shader("compute")]
[numthreads(1,1,1)]
void computeMain(uint3 tid : SV_DispatchThreadID)
{
    float acc = outBuf[tid.x];
    [ForceUnroll] for (int i = 0; i < 300; ++i)
        acc = acc * 1.0009 + sin(acc + float(i)) * 0.5 - cos(acc * 0.5);
    outBuf[0] = acc;
}