Amazon SageMaker launches new inference capabilities to reduce costs and latency

We are excited to announce new capabilities on Amazon SageMaker which help customers reduce model deployment costs by 50% on average and achieve 20% lower inference latency on average. Customers can deploy multiple models to the same instance to better utilize the underlying accelerators. SageMaker actively monitors instances that are processing inference requests and intelligently routes requests based on which instances are available.

Source:: Amazon AWS