Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates

GIXnews

5 months ago

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance…

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance computing (HPC) workloads. This post provides an overview of the following updates on cuBLAS matrix multiplications (matmuls) since version 12.0, and a walkthrough: Grouped GEMM APIs can be viewed as a generalization of the batched…

Source

Source:: NVIDIA