Today, Amazon SageMaker launched a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Container (DLC), with support for NVIDIA’s TensorRT-LLM Library. With these upgrades, customers can easily access state-of-the-art tooling to optimize Large Language Models (LLMs) on SageMaker. Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% on average and improves throughput by 60% on average for Llama2-70B, Falcon-40B and CodeLlama-34B models, compared to previous version.
Source:: Amazon AWS