Slang Standard Library Reference
- Interfaces
  - IArithmetic
    - add
    - div
    - init
    - mod
    - mul
    - neg
    - sub
  - IArithmeticAtomicable
  - IArray
    - getCount
    - subscript
  - IAtomicable
  - IBitAtomicable
  - IBufferDataLayout
  - IComparable
    - equals
    - lessThan
    - lessThanOrEquals
  - ICoopElement
    - init
  - IDefaultInitializable
    - init
  - IDiffTensorWrapper
    - loadOnce_backward
    - loadOnce_forward
    - load_backward
    - load_forward
    - storeOnce_backward
    - storeOnce_forward
    - store_backward
    - store_forward
  - IDifferentiable
    - dadd
    - dmul
    - dzero
  - IDifferentiableFunc
    - operator()
  - IDifferentiableMutatingFunc
    - operator()
  - IDifferentiablePtrType
  - IFloat
    - add
    - div
    - init
    - mod
    - mul
    - neg
    - scale
    - sub
    - toFloat
  - IFloatingPointCoopElement
  - IFunc
    - operator()
  - IInteger
    - init
    - toInt
    - toInt64
    - toUInt
    - toUInt64
  - ILogical
    - and
    - bitAnd
    - bitNot
    - bitOr
    - bitXor
    - init
    - not
    - or
    - shl
    - shr
  - IMutatingFunc
    - operator()
  - IOpaqueDescriptor
    - descriptorAccess
    - kind
  - IPhysicalBuffer
    - GetBufferPointer
    - LoadByteOffset
  - IRWArray
    - subscript
  - IRWPhysicalBuffer
    - StoreByteOffset
  - IRangedValue
    - maxValue
    - minValue
  - ITexelElement
    - elementCount
    - init
  - __BuiltinArithmeticType
  - __BuiltinFloatingPointType
    - getPi
  - __BuiltinIntegerType
  - __BuiltinLogicalType
  - __ITextureShape
    - dimensions
    - flavor
    - planeDimensions
  - __ITextureShape1D2D3D
- Types
  - Buffer types
    - AppendStructuredBuffer
      - Append
      - GetDimensions
      - Handle
      - descriptorAccess
      - init
      - kind
    - ByteAddressBuffer
      - GetBufferPointer
      - GetDimensions
      - Handle
      - Load
      - Load2
      - Load2Aligned
      - Load3
      - Load3Aligned
      - Load4
      - Load4Aligned
      - LoadAligned
      - LoadByteOffset
      - descriptorAccess
      - init
      - kind
    - ConsumeStructuredBuffer
      - Consume
      - GetDimensions
      - Handle
      - descriptorAccess
      - init
      - kind
    - RWByteAddressBuffer
      - GetBufferPointer
      - GetDimensions
      - Handle
      - InterlockedAdd
      - InterlockedAdd64
      - InterlockedAddF16
      - InterlockedAddF16Emulated
      - InterlockedAddF32
      - InterlockedAddF64
      - InterlockedAddI64
      - InterlockedAddU64
      - InterlockedAnd
      - InterlockedAnd64
      - InterlockedAndU64
      - InterlockedCompareExchange
      - InterlockedCompareExchange64
      - InterlockedCompareExchangeFloatBitwise
      - InterlockedCompareExchangeU64
      - InterlockedCompareStore
      - InterlockedCompareStore64
      - InterlockedCompareStoreFloatBitwise
      - InterlockedExchange
      - InterlockedExchange64
      - InterlockedExchangeFloat
      - InterlockedExchangeU64
      - InterlockedMax
      - InterlockedMax64
      - InterlockedMaxU64
      - InterlockedMin
      - InterlockedMin64
      - InterlockedMinU64
      - InterlockedOr
      - InterlockedOr64
      - InterlockedOrU64
      - InterlockedXor
      - InterlockedXor64
      - InterlockedXorU64
      - Load
      - Load2
      - Load2Aligned
      - Load3
      - Load3Aligned
      - Load4
      - Load4Aligned
      - LoadAligned
      - LoadByteOffset
      - Store
      - Store2
      - Store2Aligned
      - Store3
      - Store3Aligned
      - Store4
      - Store4Aligned
      - StoreAligned
      - StoreByteOffset
      - _NvInterlockedAddFp16x2
      - descriptorAccess
      - init
      - kind
    - RWStructuredBuffer
      - DecrementCounter
      - GetDimensions
      - Handle
      - IncrementCounter
      - Load
      - descriptorAccess
      - getCount
      - init
      - kind
      - subscript
    - RasterizerOrderedByteAddressBuffer
      - GetDimensions
      - Handle
      - InterlockedAdd
      - InterlockedAnd
      - InterlockedCompareExchange
      - InterlockedCompareStore
      - InterlockedExchange
      - InterlockedMax
      - InterlockedMin
      - InterlockedOr
      - InterlockedXor
      - Load
      - Load2
      - Load2Aligned
      - Load3
      - Load3Aligned
      - Load4
      - Load4Aligned
      - LoadAligned
      - Store
      - Store2
      - Store2Aligned
      - Store3
      - Store3Aligned
      - Store4
      - Store4Aligned
      - StoreAligned
      - descriptorAccess
      - init
      - kind
    - RasterizerOrderedStructuredBuffer
      - DecrementCounter
      - GetDimensions
      - Handle
      - IncrementCounter
      - Load
      - descriptorAccess
      - getCount
      - init
      - kind
      - subscript
    - StructuredBuffer
      - GetDimensions
      - Handle
      - Load
      - descriptorAccess
      - getCount
      - init
      - kind
      - subscript
  - Math types
    - matrix
      - Differential
      - T
      - add
      - dadd
      - div
      - dmul
      - dzero
      - equals
      - getCount
      - init
      - lessThan
      - lessThanOrEquals
      - mod
      - mul
      - neg
      - scale
      - sub
      - toFloat
    - vector
      - Differential
      - Element
      - add
      - and
      - bitAnd
      - bitNot
      - bitOr
      - bitXor
      - dadd
      - div
      - dmul
      - dzero
      - elementCount
      - equals
      - getCount
      - init
      - lessThan
      - lessThanOrEquals
      - mod
      - mul
      - neg
      - not
      - or
      - scale
      - shl
      - shr
      - sub
      - toFloat
      - toInt
      - toInt64
      - toUInt
      - toUInt64
  - matrix_types
    - CoopMat
      - GetColumnCount
      - GetLength
      - GetRowCount
      - Load
      - LoadCoherent
      - MapElement
      - Reduce2x2
      - ReduceColumn
      - ReduceRow
      - ReduceRowAndColumn
      - Store
      - StoreCoherent
      - Transpose
      - add
      - convertUse
      - copyFrom
      - div
      - equals
      - fill
      - getCount
      - init
      - lessThan
      - lessThanOrEquals
      - mod
      - mul
      - neg
      - sub
      - subscript
  - Miscelaneous types
    - CDataLayout
    - DefaultDataLayout
    - DefaultPushConstantDataLayout
    - MemoryOrder
    - NativeString
      - getBuffer
      - getLength
      - init
      - length
    - ScalarDataLayout
    - SideEffectBehavior
    - Std140DataLayout
    - Std430DataLayout
    - __Shape1D
      - dimensions
      - flavor
      - planeDimensions
    - __Shape2D
      - dimensions
      - flavor
      - planeDimensions
    - __Shape3D
      - dimensions
      - flavor
      - planeDimensions
    - __ShapeBuffer
      - dimensions
      - flavor
      - planeDimensions
    - __ShapeCube
      - dimensions
      - flavor
      - planeDimensions
    - string
  - Ray-tracing
    - BuiltInTriangleIntersectionAttributes
      - barycentrics
    - CANDIDATE_TYPE
    - COMMITTED_STATUS
    - HitObject
      - FromRayQuery
      - GetAttributes
      - GetClusterID
      - GetCurrentTime
      - GetGeometryIndex
      - GetHitKind
      - GetInstanceID
      - GetInstanceIndex
      - GetLssPositionsAndRadii
      - GetObjectRayDirection
      - GetObjectRayOrigin
      - GetObjectToWorld
      - GetObjectToWorld3x4
      - GetObjectToWorld4x3
      - GetPrimitiveIndex
      - GetRayDesc
      - GetRayFlags
      - GetRayTCurrent
      - GetRayTMin
      - GetShaderRecordBufferHandle
      - GetShaderTableIndex
      - GetSpherePositionAndRadius
      - GetWorldRayDirection
      - GetWorldRayOrigin
      - GetWorldToObject
      - GetWorldToObject3x4
      - GetWorldToObject4x3
      - Invoke
      - IsHit
      - IsLssHit
      - IsMiss
      - IsNop
      - IsSphereHit
      - LoadLocalRootTableConstant
      - MakeHit
      - MakeMiss
      - MakeMotionHit
      - MakeMotionMiss
      - MakeNop
      - SetShaderTableIndex
      - TraceMotionRay
      - TraceRay
      - init
    - RAYQUERY_FLAG
    - RAY_FLAG
    - RayDesc
      - Direction
      - Origin
      - TMax
      - TMin
    - RayQuery
      - Abort
      - CandidateClusterID
      - CandidateGeometryIndex
      - CandidateGetIntersectionTriangleVertexPositions
      - CandidateInstanceContributionToHitGroupIndex
      - CandidateInstanceID
      - CandidateInstanceIndex
      - CandidateObjectRayDirection
      - CandidateObjectRayOrigin
      - CandidateObjectToWorld3x4
      - CandidateObjectToWorld4x3
      - CandidatePrimitiveIndex
      - CandidateProceduralPrimitiveNonOpaque
      - CandidateRayBarycentrics
      - CandidateRayFrontFace
      - CandidateRayGeometryIndex
      - CandidateRayInstanceCustomIndex
      - CandidateRayInstanceId
      - CandidateRayInstanceShaderBindingTableRecordOffset
      - CandidateRayObjectRayDirection
      - CandidateRayObjectRayOrigin
      - CandidateRayObjectToWorld
      - CandidateRayPrimitiveIndex
      - CandidateRayWorldToObject
      - CandidateTriangleBarycentrics
      - CandidateTriangleFrontFace
      - CandidateTriangleRayT
      - CandidateType
      - CandidateWorldToObject3x4
      - CandidateWorldToObject4x3
      - CommitNonOpaqueTriangleHit
      - CommitProceduralPrimitiveHit
      - CommittedClusterID
      - CommittedGeometryIndex
      - CommittedGetIntersectionTriangleVertexPositions
      - CommittedInstanceContributionToHitGroupIndex
      - CommittedInstanceID
      - CommittedInstanceIndex
      - CommittedObjectRayDirection
      - CommittedObjectRayOrigin
      - CommittedObjectToWorld3x4
      - CommittedObjectToWorld4x3
      - CommittedPrimitiveIndex
      - CommittedRayBarycentrics
      - CommittedRayFrontFace
      - CommittedRayGeometryIndex
      - CommittedRayInstanceCustomIndex
      - CommittedRayInstanceId
      - CommittedRayInstanceShaderBindingTableRecordOffset
      - CommittedRayObjectRayDirection
      - CommittedRayObjectRayOrigin
      - CommittedRayObjectToWorld
      - CommittedRayPrimitiveIndex
      - CommittedRayT
      - CommittedRayWorldToObject
      - CommittedStatus
      - CommittedTriangleBarycentrics
      - CommittedTriangleFrontFace
      - CommittedWorldToObject3x4
      - CommittedWorldToObject4x3
      - Proceed
      - RayFlags
      - RayTMin
      - TraceRayInline
      - WorldRayDirection
      - WorldRayOrigin
      - init
    - RaytracingAccelerationStructure
      - Handle
      - descriptorAccess
      - init
      - kind
    - TRAVERSABLE_TRANSFORM_TYPE
  - Sampler types
    - SamplerComparisonState
      - Handle
      - descriptorAccess
      - init
      - kind
    - SamplerState
      - Handle
      - descriptorAccess
      - init
      - kind
  - Scalar types
    - float16_t
    - float32_t
    - float64_t
    - int32_t
    - size_t
    - ssize_t
    - uint32_t
    - usize_t
  - Stage IO types
    - InputPatch
      - subscript
    - LineStream
      - Append
      - RestartStrip
    - OutputIndices
      - subscript
    - OutputPatch
      - subscript
    - OutputPrimitives
      - subscript
    - OutputVertices
      - _metalSetVertex
      - _setVertex
      - subscript
    - PointStream
      - Append
      - RestartStrip
    - SubpassInput
    - SubpassInputMS
    - TextureFootprint
      - _isSingleLevel
      - isSingleLevel
    - TextureFootprint2D
    - TextureFootprint3D
    - TriangleStream
      - Append
      - RestartStrip
  - Texture types
    - Buffer
    - DepthTexture1D
    - DepthTexture1DArray
    - DepthTexture2D
    - DepthTexture2DArray
    - DepthTexture2DMS
    - DepthTexture2DMSArray
    - DepthTexture3D
    - DepthTextureCube
    - DepthTextureCubeArray
    - FeedbackTexture2D
    - FeedbackTexture2DArray
    - RWBuffer
    - RWSampler1D
    - RWSampler1DArray
    - RWSampler2D
    - RWSampler2DArray
    - RWSampler2DMS
    - RWSampler2DMSArray
    - RWSampler3D
    - RWTexture1D
    - RWTexture1DArray
    - RWTexture2D
    - RWTexture2DArray
    - RWTexture2DMS
    - RWTexture2DMSArray
    - RWTexture3D
    - RasterizerOrderedBuffer
    - RasterizerOrderedSampler1D
    - RasterizerOrderedSampler1DArray
    - RasterizerOrderedSampler2D
    - RasterizerOrderedSampler2DArray
    - RasterizerOrderedSampler3D
    - RasterizerOrderedTexture1D
    - RasterizerOrderedTexture1DArray
    - RasterizerOrderedTexture2D
    - RasterizerOrderedTexture2DArray
    - RasterizerOrderedTexture3D
    - SAMPLER_FEEDBACK_MIN_MIP
      - Element
      - elementCount
      - init
    - SAMPLER_FEEDBACK_MIP_REGION_USED
      - Element
      - elementCount
      - init
    - Sampler1D
    - Sampler1DArray
    - Sampler1DArrayShadow
    - Sampler1DShadow
    - Sampler2D
    - Sampler2DArray
    - Sampler2DArrayShadow
    - Sampler2DMS
    - Sampler2DMSArray
    - Sampler2DMSArrayShadow
    - Sampler2DMSShadow
    - Sampler2DShadow
    - Sampler3D
    - Sampler3DShadow
    - SamplerCube
    - SamplerCubeArray
    - SamplerCubeArrayShadow
    - SamplerCubeShadow
    - Texture1D
    - Texture1DArray
    - Texture2D
    - Texture2DArray
    - Texture2DMS
    - Texture2DMSArray
    - Texture3D
    - TextureBuffer
      - Handle
      - descriptorAccess
      - init
      - kind
    - TextureCube
    - TextureCubeArray
    - WSampler1D
    - WSampler1DArray
    - WSampler2D
    - WSampler2DArray
    - WSampler3D
    - WTexture1D
    - WTexture1DArray
    - WTexture2D
    - WTexture2DArray
    - WTexture3D
    - _Texture
      - CalculateLevelOfDetail
      - CalculateLevelOfDetailUnclamped
      - Coords
      - Footprint
      - FootprintGranularity
      - Gather
      - GatherAlpha
      - GatherBlue
      - GatherCmp
      - GatherCmpAlpha
      - GatherCmpBlue
      - GatherCmpGreen
      - GatherCmpRed
      - GatherGreen
      - GatherRed
      - GetDimensions
      - GetSamplePosition
      - Handle
      - InterlockedAddF32
      - Load
      - Sample
      - SampleBias
      - SampleCmp
      - SampleCmpBias
      - SampleCmpGrad
      - SampleCmpLevel
      - SampleCmpLevelZero
      - SampleGrad
      - SampleLevel
      - SampleLevelZero
      - Store
      - TextureCoord
      - WriteSamplerFeedback
      - WriteSamplerFeedbackBias
      - WriteSamplerFeedbackGrad
      - WriteSamplerFeedbackLevel
      - descriptorAccess
      - init
      - kind
      - queryFootprintCoarse
      - queryFootprintCoarseBias
      - queryFootprintCoarseBiasClamp
      - queryFootprintCoarseClamp
      - queryFootprintCoarseGrad
      - queryFootprintCoarseGradClamp
      - queryFootprintCoarseLevel
      - queryFootprintFine
      - queryFootprintFineBias
      - queryFootprintFineBiasClamp
      - queryFootprintFineClamp
      - queryFootprintFineGrad
      - queryFootprintFineGradClamp
      - queryFootprintFineLevel
      - subscript
  - Array
    - Differential
    - dadd
    - dmul
    - dzero
    - getCount
  - Atomic
    - add
    - and
    - compareExchange
    - decrement
    - exchange
    - increment
    - load
    - max
    - min
    - or
    - reduceAdd
    - reduceAnd
    - reduceDec
    - reduceInc
    - reduceMax
    - reduceMin
    - reduceOr
    - reduceSub
    - reduceXor
    - store
    - sub
    - xor
  - AtomicAdd
    - diff
    - loadOnce_backward
    - loadOnce_forward
    - load_backward
    - load_forward
    - storeOnce_backward
    - storeOnce_forward
    - store_backward
    - store_forward
  - BFloat16
    - init
  - BindlessDescriptorOptions
  - ConstantBuffer
    - Handle
    - descriptorAccess
    - init
    - kind
  - CoopMatClampMode
  - CoopMatMatrixLayout
  - CoopMatMatrixUse
  - CoopVec
    - add
    - copyFrom
    - div
    - equals
    - fill
    - getCount
    - init
    - lessThan
    - lessThanOrEquals
    - load
    - loadAny
    - loadCoherent
    - matMulAccum
    - matMulAccumPacked
    - matMulAddAccum
    - matMulAddAccumPacked
    - mod
    - mul
    - neg
    - replicate
    - store
    - storeAny
    - storeCoherent
    - sub
    - subscript
  - CoopVecComponentType
  - CoopVecMatrixLayout
  - DefaultVkBindlessBindings
  - DescriptorAccess
  - DescriptorHandle
    - equals
    - init
    - lessThan
    - lessThanOrEquals
  - DescriptorKind
  - DiffTensorView
    - diff
    - dims
    - init
    - load
    - loadOnce
    - primal
    - size
    - store
    - storeOnce
    - stride
    - subscript
  - DifferentialPair
    - Differential
    - DifferentialElementType
    - d
    - dadd
    - dmul
    - dzero
    - getDifferential
    - getPrimal
    - init
    - p
    - v
  - DifferentialPtrPair
    - Differential
    - DifferentialElementType
    - d
    - init
    - p
    - v
  - DispatchNodeInputRecord
    - Get
  - FloatE4M3
    - init
  - FloatE5M2
    - init
  - ImmutablePtr
  - LayoutPtr
  - NodePayloadPtr
  - NullDifferential
    - Differential
    - dadd
    - dmul
    - dummy
    - dzero
  - Optional
    - Differential
    - dadd
    - dmul
    - dzero
    - hasValue
    - init
    - value
  - ParameterBlock
  - Ptr
    - init
    - subscript
  - String
    - getLength
    - init
    - length
  - TensorLayout
    - BlockSize
    - ClampValue
    - Dimension
    - Slice
    - Stride
    - init
  - TensorView
    - Clip
    - Dimension
    - InterlockedAdd
    - InterlockedAnd
    - InterlockedCompareExchange
    - InterlockedExchange
    - InterlockedMax
    - InterlockedMin
    - InterlockedOr
    - InterlockedXor
    - Stride
    - data_ptr
    - data_ptr_at
    - dims
    - init
    - load
    - size
    - store
    - stride
    - subscript
  - TorchTensor
    - alloc
    - data_ptr
    - dims
    - emptyLike
    - fillValue
    - fillZero
    - getView
    - size
    - stride
    - zerosLike
  - Tuple
    - Differential
    - MapElement
    - dadd
    - dmul
    - dzero
    - equals
    - init
    - lessThan
    - lessThanOrEquals
  - VkMutableBindlessBindings
  - _AttributeTargets
  - extension T
    - Element
    - elementCount
    - init
  - int8_t4_packed
  - uint8_t4_packed
- Attributes
  - AutoPyBindCUDA
  - BackwardDerivative
  - BackwardDerivativeOf
  - BackwardDifferentiable
  - COM
  - CUDADeviceExport
  - CUDAHost
  - CUDAKernel
  - CudaDeviceExport
  - CudaHost
  - CudaKernel
  - DerivativeGroupLinear
  - DerivativeGroupQuad
  - DerivativeMember
  - Differentiable
  - DllExport
  - DllImport
  - ExperimentalModule
  - Flags
  - ForceInline
  - ForceUnroll
  - ForwardDerivative
  - ForwardDerivativeOf
  - ForwardDifferentiable
  - KnownBuiltin
  - MaxIters
  - MaximallyReconverges
  - NoDiffThis
  - NonUniformReturn
  - NumThreads
  - OverloadRank
  - PreferCheckpoint
  - PreferRecompute
  - PrimalSubstitute
  - PrimalSubstituteOf
  - PyExport
  - QuadDerivatives
  - RequireFullQuads
  - RequirePrelude
  - Shader
  - SpecializationConstant
  - Specialize
  - TorchEntryPoint
  - TreatAsDifferentiable
  - UnscopedEnum
  - WaveSize
  - allow
  - allow_uav_condition
  - anyValueSize
  - branch
  - builtin
  - call
  - constref
  - deprecated
  - disable_array_flattening
  - domain
  - earlydepthstencil
  - fastopt
  - flatten
  - forcecase
  - format
  - gl_binding
  - instance
  - loop
  - maxtessfactor
  - maxvertexcount
  - mutating
  - noRefInline
  - noinline
  - nonmutating
  - numthreads
  - open
  - outputcontrolpoints
  - outputtopology
  - partitioning
  - patchconstantfunc
  - push_constant
  - raypayload
  - require
  - sealed
  - shader
  - shader_record
  - spv_target_env_1_3
  - unroll
  - vk_binding
  - vk_constant_id
  - vk_image_format
  - vk_location
  - vk_offset
  - vk_push_constant
  - vk_shader_record
  - vk_specialization_constant
  - vk_spirv_instruction
- Global Declarations
  - Atomic functions
    - InterlockedAdd
    - InterlockedAnd
    - InterlockedCompareExchange
    - InterlockedCompareExchangeFloatBitwise
    - InterlockedCompareStore
    - InterlockedCompareStoreFloatBitwise
    - InterlockedExchange
    - InterlockedMax
    - InterlockedMin
    - InterlockedOr
    - InterlockedXor
  - Memory and control barriers
    - AllMemoryBarrier
    - AllMemoryBarrierWithGroupSync
    - DeviceMemoryBarrier
    - DeviceMemoryBarrierWithGroupSync
    - GroupMemoryBarrier
    - GroupMemoryBarrierWithGroupSync
  - Bit operation functions
    - countbits
    - firstbithigh
    - firstbitlow
    - reversebits
  - Conversion functions
    - asdouble
    - asfloat
    - asfloat16
    - asint
    - asint16
    - asuint
    - asuint16
    - bit_cast
    - f16tof32
    - f32tof16
    - f32tof16_
    - reinterpret
  - Derivative functions
    - ddx
    - ddx_coarse
    - ddx_fine
    - ddy
    - ddy_coarse
    - ddy_fine
    - fwidth
    - fwidth_coarse
    - fwidth_fine
  - Vertex Interpolation Functions
    - EvaluateAttributeAtCentroid
    - EvaluateAttributeAtSample
    - EvaluateAttributeSnapped
  - Math functions
    - abs
    - acos
    - acosh
    - asin
    - asinh
    - atan
    - atan2
    - atanh
    - ceil
    - clamp
    - copysign
    - copysign_double
    - copysign_float
    - copysign_half
    - cos
    - cosh
    - cospi
    - cross
    - degrees
    - determinant
    - distance
    - divide
    - dot
    - dot2add
    - dot4add_i8packed
    - dot4add_u8packed
    - dst
    - exp
    - exp10
    - exp2
    - fabs
    - faceforward
    - fdim
    - floor
    - fma
    - fmax
    - fmax3
    - fmedian3
    - fmin
    - fmin3
    - fmod
    - frac
    - fract
    - frexp
    - isfinite
    - isinf
    - isnan
    - ldexp
    - length
    - lerp
    - lit
    - log
    - log10
    - log2
    - mad
    - max
    - max3
    - median3
    - min
    - min3
    - modf
    - msad4
    - mul
    - normalize
    - pow
    - powr
    - radians
    - rcp
    - reflect
    - refract
    - rint
    - round
    - rsqrt
    - saturate
    - sign
    - sin
    - sincos
    - sinh
    - sinpi
    - smoothstep
    - sqrt
    - step
    - tan
    - tanh
    - tanpi
    - transpose
    - trunc
  - Mesh shading
    - DispatchMesh
    - SetMeshOutputCounts
  - Ray-tracing
    - AcceptHitAndEndSearch
    - CANDIDATE_NON_OPAQUE_TRIANGLE
    - CANDIDATE_PROCEDURAL_PRIMITIVE
    - COMMITTED_NOTHING
    - COMMITTED_PROCEDURAL_PRIMITIVE_HIT
    - COMMITTED_TRIANGLE_HIT
    - CallShader
    - DispatchRaysDimensions
    - DispatchRaysIndex
    - GeometryIndex
    - GetClusterID
    - GetInstanceTraversable
    - GetLssPositionsAndRadii
    - GetSpherePositionAndRadius
    - GetTransformListHandle
    - GetTransformListSize
    - GetTraversableChild
    - GetTraversableInstanceId
    - GetTraversableObjectToWorld
    - GetTraversableTransformType
    - GetTraversableWorldToObject
    - HIT_KIND_TRIANGLE_BACK_FACE
    - HIT_KIND_TRIANGLE_FRONT_FACE
    - HitKind
    - HitTriangleVertexPosition
    - IgnoreHit
    - InstanceID
    - InstanceIndex
    - IsLssHit
    - IsSphereHit
    - ObjectRayDirection
    - ObjectRayOrigin
    - ObjectToWorld
    - ObjectToWorld3x4
    - ObjectToWorld4x3
    - PrimitiveIndex
    - RAYQUERY_FLAG_ALLOW_OPACITY_MICROMAPS
    - RAYQUERY_FLAG_NONE
    - RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH
    - RAY_FLAG_CULL_BACK_FACING_TRIANGLES
    - RAY_FLAG_CULL_FRONT_FACING_TRIANGLES
    - RAY_FLAG_CULL_NON_OPAQUE
    - RAY_FLAG_CULL_OPAQUE
    - RAY_FLAG_FORCE_NON_OPAQUE
    - RAY_FLAG_FORCE_OMM_2_STATE
    - RAY_FLAG_FORCE_OPAQUE
    - RAY_FLAG_NONE
    - RAY_FLAG_SKIP_CLOSEST_HIT_SHADER
    - RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES
    - RAY_FLAG_SKIP_TRIANGLES
    - RayCurrentTime
    - RayFlags
    - RayTCurrent
    - RayTMin
    - ReportHit
    - ReportHitOptix
    - TraceMotionRay
    - TraceRay
    - WorldRayDirection
    - WorldRayOrigin
    - WorldToObject
    - WorldToObject3x4
    - WorldToObject4x3
  - Tessellation functions
    - Process2DQuadTessFactorsAvg
    - Process2DQuadTessFactorsMax
    - Process2DQuadTessFactorsMin
    - ProcessIsolineTessFactors
    - ProcessQuadTessFactorsAvg
    - ProcessQuadTessFactorsMax
    - ProcessQuadTessFactorsMin
    - ProcessTriTessFactorsAvg
    - ProcessTriTessFactorsMax
    - ProcessTriTessFactorsMin
  - Wave and quad functions
    - QuadReadAcrossDiagonal
    - QuadReadAcrossX
    - QuadReadAcrossY
    - QuadReadLaneAt
    - WaveActiveAllEqual
    - WaveActiveAllTrue
    - WaveActiveAnyTrue
    - WaveActiveBallot
    - WaveActiveBitAnd
    - WaveActiveBitOr
    - WaveActiveBitXor
    - WaveActiveCountBits
    - WaveActiveMax
    - WaveActiveMin
    - WaveActiveProduct
    - WaveActiveSum
    - WaveBroadcastLaneAt
    - WaveGetActiveMulti
    - WaveGetConvergedMulti
    - WaveGetLaneCount
    - WaveGetLaneEqMask
    - WaveGetLaneGeMask
    - WaveGetLaneGtMask
    - WaveGetLaneIndex
    - WaveGetLaneLeMask
    - WaveGetLaneLtMask
    - WaveGetNumWaves
    - WaveIsFirstLane
    - WaveMatch
    - WaveMultiPrefixCountBits
    - WavePrefixBitAnd
    - WavePrefixBitOr
    - WavePrefixBitXor
    - WavePrefixCountBits
    - WavePrefixMax
    - WavePrefixMin
    - WavePrefixProduct
    - WavePrefixSum
    - WaveReadLaneAt
    - WaveReadLaneFirst
    - WaveShuffle
    - _WaveCountBits
  - CheckAccessFullyMapped
  - D3DCOLORtoUBYTE4
  - GetAttributeAtVertex
  - GetRenderTargetSampleCount
  - GetRenderTargetSamplePosition
  - InterlockedAddF16Emulated
  - InterlockedAddF16x2
  - IsHelperLane
  - MaybeReorderThread
  - NonUniformResourceIndex
  - QuadAll
  - QuadAny
  - ReorderThread
  - WaveClusteredRotate
  - WaveMultiBitAnd
  - WaveMultiBitOr
  - WaveMultiBitXor
  - WaveMultiMax
  - WaveMultiMin
  - WaveMultiPrefixBitAnd
  - WaveMultiPrefixBitOr
  - WaveMultiPrefixBitXor
  - WaveMultiPrefixExclusiveBitAnd
  - WaveMultiPrefixExclusiveBitOr
  - WaveMultiPrefixExclusiveBitXor
  - WaveMultiPrefixExclusiveMax
  - WaveMultiPrefixExclusiveMin
  - WaveMultiPrefixExclusiveProduct
  - WaveMultiPrefixExclusiveSum
  - WaveMultiPrefixInclusiveBitAnd
  - WaveMultiPrefixInclusiveBitOr
  - WaveMultiPrefixInclusiveBitXor
  - WaveMultiPrefixInclusiveMax
  - WaveMultiPrefixInclusiveMin
  - WaveMultiPrefixInclusiveProduct
  - WaveMultiPrefixInclusiveSum
  - WaveMultiPrefixProduct
  - WaveMultiPrefixSum
  - WaveMultiProduct
  - WaveMultiSum
  - WaveRotate
  - WorkgroupCount
  - WorkgroupSize
  - abort
  - all
  - any
  - bitfieldExtract
  - bitfieldInsert
  - clip
  - clock2x32ARB
  - clockARB
  - concat
  - coopVecLoad
  - coopVecLoadCoherent
  - coopVecLoadGroupshared
  - coopVecMatMul
  - coopVecMatMulAdd
  - coopVecMatMulAddPacked
  - coopVecMatMulPacked
  - coopVecOuterProductAccumulate
  - coopVecReduceSumAccumulate
  - createDynamicObject
  - cudaBlockDim
  - cudaBlockIdx
  - cudaThreadIdx
  - debugBreak
  - defaultGetDescriptorFromHandle
  - detach
  - diffPair
  - enableVMMDeviceScopeCapabilityIfNeeded
  - floatCast
  - getDescriptorFromHandle
  - getRealtimeClock
  - getRealtimeClockLow
  - getStringHash
  - isDifferentialNull
  - loadAligned
  - makeArrayFromElement
  - makeTuple
  - nextafter
  - nonuniform
  - operator*
  - operator?:
  - packHalf2x16
  - packInt4x8
  - packInt4x8Clamp
  - packSnorm2x16
  - packSnorm4x8
  - packUint4x8
  - packUint4x8Clamp
  - packUnorm2x16
  - packUnorm4x8
  - pack_clamp_s8
  - pack_clamp_u8
  - pack_s8
  - pack_u8
  - printf
  - select
  - static_assert
  - storeAligned
  - syncTorchCudaStream
  - unmodified
  - unpackHalf2x16ToFloat
  - unpackHalf2x16ToHalf
  - unpackInt4x8ToInt16
  - unpackInt4x8ToInt32
  - unpackSnorm2x16ToFloat
  - unpackSnorm2x16ToHalf
  - unpackSnorm4x8ToFloat
  - unpackSnorm4x8ToHalf
  - unpackUint4x8ToUint16
  - unpackUint4x8ToUint32
  - unpackUnorm2x16ToFloat
  - unpackUnorm2x16ToHalf
  - unpackUnorm4x8ToFloat
  - unpackUnorm4x8ToHalf
  - unpack_s8s16
  - unpack_s8s32
  - unpack_u8u16
  - unpack_u8u32
  - unused
  - updateDiff
  - updatePair
  - updatePrimal
  - workgroupUniformLoad

struct linalg::CoopMat<T, MemoryScope S, int M, int N, linalg.CoopMatMatrixUse R>

Conforms to: IArray<T>

Conditionally conforms to: IArithmetic

Description

Represents a cooperative matrix for efficient warp/subgroup-level matrix operations on GPU hardware. CoopMat enables high-performance matrix multiply-accumulate operations by distributing matrix fragments across threads within a warp or subgroup. This type leverages specialized hardware instructions such as CUDA’s WMMA (Warp Matrix Multiply-Accumulate) or Vulkan cooperative matrix extensions.

Generic Parameters

T: ICoopElement

The element type of the matrix. Must be a built-in arithmetic type.

S : MemoryScope

The memory scope defining the cooperative group (e.g., device, workgroup, subgroup).

M : int

The number of rows in the matrix fragment.

N : int

The number of columns in the matrix fragment.

R : linalg.CoopMatMatrixUse

The matrix use specifier indicating whether this is a Matrix A, Matrix B, or accumulator matrix.

Methods

Conditional Conformances

Conformance to IArithmetic

linalg::CoopMat<T, MemoryScope S, int M, int N, linalg.CoopMatMatrixUse R> additionally conforms to IArithmetic when the following conditions are met:

T : __BuiltinArithmeticType
Remarks

The dimensions M and N must match hardware-supported fragment shapes. For CUDA WMMA, valid shape combinations are (where k is always 16):

Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116)
Shape m8n32k16: Matrix A (84294967235429496719116), Matrix B (164294967235429496719132), Accumulator (84294967235429496719132)
Shape m32n8k16: Matrix A (324294967235429496719116), Matrix B (16429496723542949671918), Accumulator (32429496723542949671918)

Matrix A dimensions are (m42949672354294967191k), Matrix B dimensions are (k42949672354294967191n), and Accumulator dimensions are (m42949672354294967191n). For all CUDA WMMA shapes listed above:

Matrix A and B support: half, uint8, int8
Accumulator (Matrix C) supports: float, half, int

All matrices involved in a multiply-accumulate operation must use the same shape combination. The actual physical layout and distribution of elements across threads is hardware-specific.

When targeting Vulkan/SPIR-V, this type uses the SPV_KHR_cooperative_matrix extension (and optionally SPV_NV_cooperative_matrix2 for advanced features like transpose, reductions, and per-element operations). Valid shape combinations for Vulkan cooperative matrices (example device properties):

With float16 elements (A/B/C element types):

Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116) - half/half/half
Shape m16n8k16: Matrix A (164294967235429496719116), Matrix B (16429496723542949671918), Accumulator (16429496723542949671918) - half/half/half
Shape m16n8k8: Matrix A (16429496723542949671918), Matrix B (8429496723542949671918), Accumulator (16429496723542949671918) - half/half/half
Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116) - half/half/float
Shape m16n8k16: Matrix A (164294967235429496719116), Matrix B (16429496723542949671918), Accumulator (16429496723542949671918) - half/half/float
Shape m16n8k8: Matrix A (16429496723542949671918), Matrix B (8429496723542949671918), Accumulator (16429496723542949671918) - half/half/float

With 8-bit integer elements (A/B/C element types):

Shape m16n16k32: Matrix A (164294967235429496719132), Matrix B (324294967235429496719116), Accumulator (164294967235429496719116) - uint8/uint8/uint32
Shape m16n16k32: Matrix A (164294967235429496719132), Matrix B (324294967235429496719116), Accumulator (164294967235429496719116) - int8/int8/int32
Shape m16n8k32: Matrix A (164294967235429496719132), Matrix B (32429496723542949671918), Accumulator (16429496723542949671918) - uint8/uint8/uint32
Shape m16n8k32: Matrix A (164294967235429496719132), Matrix B (32429496723542949671918), Accumulator (16429496723542949671918) - int8/int8/int32

Note: Vulkan’s supported shapes are device-specific and can be queried at runtime using VkPhysicalDeviceCooperativeMatrixPropertiesKHR. The above list represents common configurations but may vary by GPU vendor and driver. The element distribution across threads in a subgroup may differ between CUDA and Vulkan implementations, so code using the subscript operator should only perform uniform operations for portability. If your code specifies a combination that is not supported by the device, the behavior is undefined.

Additionally, while only MemoryScope.Subgroup (warp-level cooperation) is supported on CUDA, MemoryScope.Workgroup can be used when targeting Vulkan, allowing cooperation among threads within the entire workgroup. Whenever Workgroup scope is supported, it is recommended to use it instead of Subgroup scope for simplicity and performance.

When using MemoryScope.Workgroup, Slang will emit SPIR-V code that uses the SPV_NV_cooperative_matrix2 extension. A workgroup-scope cooperative matrix can use larger matrix shapes that are multiples of 16/32 depending on the target device. Workgroup-scope cooperative matrices requires a specific workgroup size setting (specified via [numthreads]). Use the Vulkan API to query the supported combinations of element type, matrix shape and workgroup size settings.