Amazon Bedrock now supports compressed embeddings (int8 and binary) from the Cohere Embed model, enabling developers and businesses to build more efficient generative AI applications without compromising on performance. Cohere Embed is a leading text embedding model. It is most frequently used to power Retrieval-Augmented Generation (RAG) and semantic search systems.
The text embeddings output by the Cohere Embed model must be stored in a database with vector search capabilities, with storage costs being directly related to the dimensions of the embedding output as well as the number format precision. Cohere’s compression-aware model training techniques allows the model to output embeddings in binary and int8 precision format, which are significantly smaller in size than the often used FP32 precision format, with minimal accuracy degradation. This unlocks the ability to run your enterprise search applications faster, cheaper, and more efficiently. int8 and binary embeddings are especially interesting for large, multi-tenancy setups, where the ability to search millions of embeddings within milliseconds is a critical business advantage. Cohere’s compressed embeddings allow you to build applications which are efficient enough to put into production at scale, accelerating your AI strategy to support your employees and customers.
Cohere Embed int8 and binary embeddings are now available in Amazon Bedrock in all the AWS Regions where the Cohere Embed model is available. To learn more, read the Cohere in Amazon Bedrock product page, documentation, and Cohere launch blog. To get started with Cohere models in Amazon Bedrock, visit the Amazon Bedrock console.
Source:: Amazon AWS