Amazon Bedrock Knowledge Bases announces support for cross-region inference, an optional feature that enables developers to seamlessly manage traffic bursts by utilizing compute across different AWS Regions.
By using cross-region inference, Amazon Bedrock Knowledge Base customers using RetrieveAndGenerate API will be able to get higher throughput limits and enhanced resilience during periods of peak demand. By opting in, developers no longer have to spend time and effort predicting demand fluctuations. Instead, cross-region inference dynamically routes traffic across multiple regions, ensuring optimal availability for each request and smoother performance during high-usage periods. To use cross-region inference, customers need to specify an inference profile as the “modelARN“ in the request of RetrieveAndGenerate API. There’s no additional routing cost for using cross-region inference and you will be charged based on the region you made the request in (source region).
Please find the list of supported models and pre-defined regions here. To learn more about the feature and how to get started, refer to the Amazon Bedrock documentation or this blog.
Source:: Amazon AWS