Model Evaluation on Amazon Bedrock allows you to evaluate, compare, and select the best foundation models for your use case. Amazon Bedrock offers a choice of automatic evaluation and human evaluation. You can use automatic evaluation with predefined algorithms for metrics such as accuracy, robustness, and toxicity. Model evaluation provides built-in curated datasets or you can bring your own datasets.
Amazon Bedrock’s interactive interface guides you through model evaluation. You simply choose automatic evaluation, select the task type and metrics, and upload your prompt dataset. Amazon Bedrock then runs evaluations and generates a report, so you can easily understand how the model performed against the metrics you selected, and choose the right one for your use case. Using this report in conjunction with the cost and latency metrics from the Amazon Bedrock, you can select the model with the required quality, cost, and latency tradeoff.
Model Evaluation on Amazon Bedrock is now Generally Available in AWS GovCloud (US-West) in addition to many commercial regions.
To learn more about Model Evaluation on Amazon Bedrock, see the Amazon Bedrock developer experience web page. To get started, sign in to Amazon Bedrock on the AWS Management Console or use the Amazon Bedrock APIs.
Source:: Amazon AWS