struct linalg::CoopMat<T, MemoryScope S, int M, int N, linalg.CoopMatMatrixUse R>
Conditionally conforms to: IArithmetic
Description
Represents a cooperative matrix for efficient warp/subgroup-level matrix operations on GPU hardware. CoopMat enables high-performance matrix multiply-accumulate operations by distributing matrix fragments across threads within a warp or subgroup. This type leverages specialized hardware instructions such as CUDA’s WMMA (Warp Matrix Multiply-Accumulate) or Vulkan cooperative matrix extensions.
Generic Parameters
T: ICoopElement
The element type of the matrix. Must be a built-in arithmetic type.
S : MemoryScope
The memory scope defining the cooperative group (e.g., device, workgroup, subgroup).
M : int
The number of rows in the matrix fragment.
N : int
The number of columns in the matrix fragment.
R : linalg.CoopMatMatrixUse
The matrix use specifier indicating whether this is a Matrix A, Matrix B, or accumulator matrix.
Methods
- init
- fill
- copyFrom
- getCount
- subscript
- GetLength
- GetRowCount
- GetColumnCount
- Transpose
- ReduceRow
- ReduceColumn
- ReduceRowAndColumn
- Reduce2x2
- MapElement
- StoreCoherent
- LoadCoherent
- Load
- Store
- convertUse
- add
- sub
- mul
- div
- neg
- mod
- equals
- lessThan
- lessThanOrEquals
Conditional Conformances
Conformance to IArithmetic
linalg::CoopMat<T, MemoryScope S, int M, int N, linalg.CoopMatMatrixUse R> additionally conforms to IArithmetic when the following conditions are met:
- T : __BuiltinArithmeticType
Remarks
The dimensions M and N must match hardware-supported fragment shapes. For CUDA WMMA, valid shape combinations are (where k is always 16):
- Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116)
- Shape m8n32k16: Matrix A (84294967235429496719116), Matrix B (164294967235429496719132), Accumulator (84294967235429496719132)
- Shape m32n8k16: Matrix A (324294967235429496719116), Matrix B (16429496723542949671918), Accumulator (32429496723542949671918)
Matrix A dimensions are (m42949672354294967191k), Matrix B dimensions are (k42949672354294967191n), and Accumulator dimensions are (m42949672354294967191n). For all CUDA WMMA shapes listed above:
- Matrix A and B support: half, uint8, int8
- Accumulator (Matrix C) supports: float, half, int
All matrices involved in a multiply-accumulate operation must use the same shape combination. The actual physical layout and distribution of elements across threads is hardware-specific.
When targeting Vulkan/SPIR-V, this type uses the SPV_KHR_cooperative_matrix extension (and optionally SPV_NV_cooperative_matrix2 for advanced features like transpose, reductions, and per-element operations). Valid shape combinations for Vulkan cooperative matrices (example device properties):
With float16 elements (A/B/C element types):
- Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116) - half/half/half
- Shape m16n8k16: Matrix A (164294967235429496719116), Matrix B (16429496723542949671918), Accumulator (16429496723542949671918) - half/half/half
- Shape m16n8k8: Matrix A (16429496723542949671918), Matrix B (8429496723542949671918), Accumulator (16429496723542949671918) - half/half/half
- Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116) - half/half/float
- Shape m16n8k16: Matrix A (164294967235429496719116), Matrix B (16429496723542949671918), Accumulator (16429496723542949671918) - half/half/float
- Shape m16n8k8: Matrix A (16429496723542949671918), Matrix B (8429496723542949671918), Accumulator (16429496723542949671918) - half/half/float
With 8-bit integer elements (A/B/C element types):
- Shape m16n16k32: Matrix A (164294967235429496719132), Matrix B (324294967235429496719116), Accumulator (164294967235429496719116) - uint8/uint8/uint32
- Shape m16n16k32: Matrix A (164294967235429496719132), Matrix B (324294967235429496719116), Accumulator (164294967235429496719116) - int8/int8/int32
- Shape m16n8k32: Matrix A (164294967235429496719132), Matrix B (32429496723542949671918), Accumulator (16429496723542949671918) - uint8/uint8/uint32
- Shape m16n8k32: Matrix A (164294967235429496719132), Matrix B (32429496723542949671918), Accumulator (16429496723542949671918) - int8/int8/int32
Note: Vulkan’s supported shapes are device-specific and can be queried at runtime using VkPhysicalDeviceCooperativeMatrixPropertiesKHR. The above list represents common configurations but may vary by GPU vendor and driver. The element distribution across threads in a subgroup may differ between CUDA and Vulkan implementations, so code using the subscript operator should only perform uniform operations for portability. If your code specifies a combination that is not supported by the device, the behavior is undefined.
Additionally, while only MemoryScope.Subgroup (warp-level cooperation) is supported on CUDA, MemoryScope.Workgroup can be used when targeting Vulkan, allowing cooperation among threads within the entire workgroup. Whenever Workgroup scope is supported, it is recommended to use it instead of Subgroup scope for simplicity and performance.
When using MemoryScope.Workgroup, Slang will emit SPIR-V code that uses the SPV_NV_cooperative_matrix2 extension. A workgroup-scope cooperative matrix can use larger matrix shapes that are multiples of 16/32 depending on the target device. Workgroup-scope cooperative matrices requires a specific workgroup size setting (specified via [numthreads]). Use the Vulkan API to query the supported combinations of element type, matrix shape and workgroup size settings.