NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it’s important to understand the…
NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it’s important to understand the compute and memory profile of these microservices to set up a successful autoscaling plan. In this post, we describe how to set up and use Kubernetes Horizontal Pod Autoscaling (HPA) with an NVIDIA NIM for LLMs model to automatically scale…
Source
Source:: NVIDIA