Today, AWS announces the general availability of Neuron SDK 2.25.0, delivering improvements for inference workloads and performance monitoring on AWS Inferentia and Trainium instances. This latest release adds context and data parallelism support as well as chunked attention for long sequence processing in inference, and updates the neuron-ls and neuron-monitor APIs with more information on node affinities and device utilization, respectively.
This release also introduces automatic aliasing (Beta) for fast tensor operations, and adds improvements for disaggregated serving (Beta). Finally, it provides upgraded AMIs and Deep Learning Containers for inference and training workloads on Neuron.
Neuron 2.25.0 is available in all AWS Regions where Inferentia and Trainium instances are offered.
To learn more and for a full list of new features and enhancements, see:
AWS Neuron 2.25.0 release notes
Trn2 Instances
Trn1 Instances
Inf2 Instances
Source:: Amazon AWS