Gemma 2, the next generation of Google Gemma models, is now optimized with TensorRT-LLM and packaged as NVIDIA NIM inference microservice.
Gemma 2, the next generation of Google Gemma models, is now optimized with TensorRT-LLM and packaged as NVIDIA NIM inference microservice.
Source
Source:: NVIDIA