Amazon SageMaker HyperPod now provides flexible training plans

Amazon SageMaker HyperPod announces flexible training plans, a new capability that allows you to train generative AI models within your timelines and budgets. Gain predictable model training timelines and run training workloads within your budget requirements, while continuing to benefit from features of SageMaker HyperPod such as resiliency, performance-optimized distributed training, and enhanced observability and monitoring. 

In a few quick steps, you can specify your preferred compute instances, desired amount of compute resources, duration of your workload, and preferred start date for your generative AI model training. SageMaker then helps you create the most cost-efficient training plans, reducing time to train your model by weeks. Once you create and purchase your training plans, SageMaker automatically provisions the infrastructure and runs the training workloads on these compute resources without requiring any manual intervention. SageMaker also automatically takes care of pausing and resuming training between gaps in compute availability, as the plan switches from one capacity block to another. If you wish to remove all the heavy lifting of infrastructure management, you can also create and run training plans using SageMaker fully managed training jobs.  

SageMaker HyperPod flexible training plans are available in the US East (N. Virginia), US East (Ohio), and US West (Oregon) AWS Regions. To learn more, visit: SageMaker HyperPod, documentation, and the announcement blog

Source:: Amazon AWS