Task governance is now generally available for Amazon SageMaker HyperPod

Amazon SageMaker HyperPod now provides you with centralized governance across all generative AI development tasks, such as training and inference. You have full visibility and control over compute resource allocation, ensuring the most critical tasks are prioritized and maximizing compute resource utilization, reducing model development costs by up to 40%.

With HyperPod task governance, administrators can more easily define priorities for different tasks and set up limits for how many compute resources each team can use. At any given time, administrators can also monitor and audit the tasks that are running or waiting for compute resources through a visual dashboard. When data scientists create their tasks, HyperPod automatically runs them, adhering to the defined compute resource limits and priorities. For example, when training for a high-priority model needs to be completed as soon as possible but all compute resources are in use, HyperPod frees up resources from lower-priority tasks to support the training. HyperPod pauses the low-priority task, saves the checkpoint, and reallocates the freed-up compute resources. The preempted low-priority task will resume from the last saved checkpoint as resources become available again. And when a team is not fully using the resource limits the administrator has set up, HyperPod use those idle resources to accelerate another team’s tasks. Additionally, HyperPod is now integrated with Amazon SageMaker Studio, bringing task governance and other HyperPod capabilities into the Studio environment. Data scientists can now seamlessly interact with HyperPod clusters directly from Studio, allowing them to develop, submit, and monitor machine learning (ML) jobs on powerful accelerator-backed clusters.

Task governance for HyperPod is available in all AWS Regions where HyperPod is available: US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Stockholm), and South America (São Paulo).

To learn more, visit SageMaker HyperPod webpage, AWS News Blog, and SageMaker AI documentation.

Source:: Amazon AWS