Today, AWS announces the general availability of Neuron 2.24, delivering new features and performance improvements for customers building and deploying deep learning models on AWS Inferentia and Trainium-based instances. Neuron 2.24 introduces support for PyTorch 2.7, enhanced inference capabilities, and expanded compatibility with popular machine learning frameworks. These updates help developers and data scientists accelerate model training and inference, improve efficiency, and simplify the deployment of large language models and other AI workloads.
With Neuron 2.24, customers can take advantage of advanced inference features such as prefix caching for faster Time-To-First-Token (TTFT), disaggregated inference to reduce prefill-decode interference, and context parallelism for improved performance on long sequences. The release also brings support for Qwen 2.5 text models and improved integration with Hugging Face Optimum Neuron and PyTorch-based NxD Core backend.
Neuron 2.24 is available in all AWS Regions where Inferentia and Trainium instances are offered.
To learn more and for a full list of new features and enhancements, see:
AWS Neuron 2.24 release notes
Trn2 Instances
Trn1 Instances
Inf2 Instances
Source:: Amazon AWS