Announcing support for multiple containers on Amazon SageMaker Inference endpoints, leading to cost savings of up to 80%

Amazon SageMaker now supports deploying multiple containers on real-time endpoints for low latency inferences and invoking them independently for each request. This new capability enables you to run up to five different machine learning (ML) models and frameworks on a single endpoint and save up to 80% in costs. This option is ideal when you have multiple ML models with similar resource needs and when individual models don’t have sufficient traffic to utilize the full capacity of the endpoint instances. For example, if you have a set of ML models that are invoked infrequently or at different times, or if you have dev/test endpoints.

Source:: Amazon AWS