struct linalg::CoopMat<T, MemoryScope S, int M, int N, linalg.CoopMatMatrixUse R>
Conforms to: IArray<T>, IArithmetic
Description
Represents a cooperative matrix for efficient warp/subgroup-level matrix operations on GPU hardware. CoopMat enables high-performance matrix multiply-accumulate operations by distributing matrix fragments across threads within a warp or subgroup. This type leverages specialized hardware instructions such as CUDA’s WMMA (Warp Matrix Multiply-Accumulate) or Vulkan cooperative matrix extensions.
Generic Parameters
T: __BuiltinArithmeticType
The element type of the matrix. Must be a built-in arithmetic type.
S : MemoryScope
The memory scope defining the cooperative group (e.g., device, workgroup, subgroup).
M : int
The number of rows in the matrix fragment.
N : int
The number of columns in the matrix fragment.
R : linalg.CoopMatMatrixUse
The matrix use specifier indicating whether this is a Matrix A, Matrix B, or accumulator matrix.
Methods
- init
- fill
- copyFrom
- getCount
- subscript
- GetLength
- GetRowCount
- GetColumnCount
- Transpose
- ReduceRow
- ReduceColumn
- ReduceRowAndColumn
- Reduce2x2
- MapElement
- StoreCoherent
- LoadCoherent
- add
- sub
- mul
- div
- neg
- mod
- equals
- lessThan
- lessThanOrEquals
- Load
- Store
Remarks
The dimensions M and N must match hardware-supported fragment shapes. For CUDA WMMA, valid shape combinations are (where k is always 16):
- Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116)
- Shape m8n32k16: Matrix A (84294967235429496719116), Matrix B (164294967235429496719132), Accumulator (84294967235429496719132)
- Shape m32n8k16: Matrix A (324294967235429496719116), Matrix B (16429496723542949671918), Accumulator (32429496723542949671918)
Matrix A dimensions are (m42949672354294967191k), Matrix B dimensions are (k42949672354294967191n), and Accumulator dimensions are (m42949672354294967191n). For all CUDA WMMA shapes listed above:
- Matrix A and B support: half, uint8, int8
- Accumulator (Matrix C) supports: float, half, int
All matrices involved in a multiply-accumulate operation must use the same shape combination. The actual physical layout and distribution of elements across threads is hardware-specific.
When targeting Vulkan/SPIR-V, this type uses the SPV_KHR_cooperative_matrix extension (and optionally SPV_NV_cooperative_matrix2 for advanced features like transpose, reductions, and per-element operations). Valid shape combinations for Vulkan cooperative matrices (example device properties):
With float16 elements (A/B/C element types):
- Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116) - half/half/half
- Shape m16n8k16: Matrix A (164294967235429496719116), Matrix B (16429496723542949671918), Accumulator (16429496723542949671918) - half/half/half
- Shape m16n8k8: Matrix A (16429496723542949671918), Matrix B (8429496723542949671918), Accumulator (16429496723542949671918) - half/half/half
- Shape m16n16k16: Matrix A (164294967235429496719116), Matrix B (164294967235429496719116), Accumulator (164294967235429496719116) - half/half/float
- Shape m16n8k16: Matrix A (164294967235429496719116), Matrix B (16429496723542949671918), Accumulator (16429496723542949671918) - half/half/float
- Shape m16n8k8: Matrix A (16429496723542949671918), Matrix B (8429496723542949671918), Accumulator (16429496723542949671918) - half/half/float
With 8-bit integer elements (A/B/C element types):
- Shape m16n16k32: Matrix A (164294967235429496719132), Matrix B (324294967235429496719116), Accumulator (164294967235429496719116) - uint8/uint8/uint32
- Shape m16n16k32: Matrix A (164294967235429496719132), Matrix B (324294967235429496719116), Accumulator (164294967235429496719116) - int8/int8/int32
- Shape m16n8k32: Matrix A (164294967235429496719132), Matrix B (32429496723542949671918), Accumulator (16429496723542949671918) - uint8/uint8/uint32
- Shape m16n8k32: Matrix A (164294967235429496719132), Matrix B (32429496723542949671918), Accumulator (16429496723542949671918) - int8/int8/int32
Note: Vulkan’s supported shapes are device-specific and can be queried at runtime using VkPhysicalDeviceCooperativeMatrixPropertiesKHR. The above list represents common configurations but may vary by GPU vendor and driver. The element distribution across threads in a subgroup may differ between CUDA and Vulkan implementations, so code using the subscript operator should only perform uniform operations for portability.