Today, we are announcing the availability of sticky session routing on Amazon SageMaker Inference which helps customers improve the performance and user experience of their generative AI applications by leveraging their previously processed information. Amazon SageMaker makes it easier to deploy ML models including foundation models (FMs) to make inference requests at the best price performance for any use case.
By enabling sticky sessions, all requests for the same session will be routed to the same instance, allowing your ML application to reuse previously processed information to reduce latency and improve user experience. This is particularly valuable when customers want to use large data payloads or have the need for seamless interactive experiences. By leveraging their previous inference requests, customers can now take advantage of this feature to build innovative state-aware AI applications on SageMaker. To do this customers will have to create a session id with their first request and then use that session id to indicate that SageMaker should route all the subsequent requests to the same instance. Sessions can also be deleted when done to free up resources for new sessions.
This feature is available in all regions where SageMaker is available. You can learn more about deploying models on SageMaker here and more about this feature in our documentation.
Source:: Amazon AWS