Amazon Bedrock Agents, Flows, and Knowledge Bases now offers support for the recently announced, in-preview, latency-optimized models via the SDK. This enhancement brings faster response times and improved responsiveness to AI applications built with Amazon Bedrock Tooling. Currently, this optimization is available for Anthropic’s Claude 3.5 Haiku model and Meta’s Llama 3.1 405B and 70B models, delivering reduced latency compared to standard models without compromising accuracy.
This update is particularly beneficial for customers developing latency-sensitive applications such as real-time customer service chatbots and interactive coding assistants. By leveraging purpose-built AI chips like AWS Trainium2 and advanced software optimizations in Amazon Bedrock, customers can now access more options to optimize their inference for specific use cases. Importantly, these capabilities can be integrated immediately into existing applications without additional setup or model fine-tuning, resulting in enhanced performance and faster response times.
The latency-optimized inference support for Amazon Bedrock Agents, Flows, and Knowledge Bases is available in the US East (Ohio) Region via cross-region inference. Customers can access these new capabilities through the Amazon Bedrock SDK via a runtime configuration, enabling them to programmatically incorporate these optimized models into their workflows and applications.
To learn more about Amazon Bedrock and its capabilities, including this new latency-optimized inference support, visit the Amazon Bedrock product page, pricing page, and documentation.
Source:: Amazon AWS