This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference…
This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference with TensorRT-LLM. See LLM Inference Benchmarking: Fundamental Concepts for background knowledge on common metrics for benchmarking and parameters. And read LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM for tips on using GenAI…
Source
Source:: NVIDIA