Amazon SageMaker HyperPod recipes help you get started training and fine-tuning publicly available foundation models (FMs) in minutes with state-of-the-art performance. SageMaker HyperPod helps customers scale generative AI model development across hundreds or thousands of AI accelerators with built-in resiliency and performance optimizations, decreasing model training time by up to 40%. However, as FM sizes continue to grow to hundreds of billions of parameters, the process of customizing these models can take weeks of extensive experimenting and debugging. In addition, performing training optimizations to unlock better price performance is often unfeasible for customers, as they often require deep machine learning expertise that could cause further delays in time to market.
With SageMaker HyperPod recipes, customers of all skill sets can benefit from state-of-the-art performance while quickly getting started training and fine-tuning popular publicly available FMs, including Llama 3.1 405B, Mixtral 8x22B, and Mistral 7B. SageMaker HyperPod recipes include a training stack tested by AWS, removing weeks of tedious work experimenting with different model configurations. You can also quickly switch between GPU-based and AWS Trainium-based instances with a one-line recipe change and enable automated model checkpointing for improved training resiliency. Finally, you can run workloads in production on the SageMaker AI training service of your choice.
SageMaker HyperPod recipes are available in all AWS Regions where SageMaker HyperPod and SageMaker training jobs are supported. To learn more and get started, visit the SageMaker HyperPod page and blog.
Source:: Amazon AWS