Today, Amazon Bedrock announces support for cross-region inference, an optional feature that enables developers to seamlessly manage traffic bursts by utilizing compute across different AWS Regions. By using cross-region inference, Bedrock customers using on-demand mode will be able to get higher throughput limits (up to 2x their allocated in-region quotas) and enhanced resilience during periods of peak demand. By opting in, developers no longer have to spend time and effort predicting demand fluctuations. Instead, cross-region inference dynamically routes traffic across multiple regions, ensuring optimal availability for each request and smoother performance during high-usage periods.
Customers can control where their inference data flows by selecting from a pre-defined set of regions, helping them comply with applicable data residency requirements and sovereignty laws. Moreover, this capability prioritizes the connected Bedrock API source region when possible, helping to minimize latency and improve responsiveness. As a result, customers can enhance their applications’ reliability, performance, and efficiency.
There’s no additional routing cost for using cross-region inference and you will be charged based on the region you made the request in (source region). Please find the list of supported models and pre-defined regions here. To learn more about the feature and how to get started, refer to the Amazon Bedrock documentation or this blog.
Source:: Amazon AWS