Today, AWS announces a major version release of the Amazon SageMaker model parallel library (SMP), which now is compatible with PyTorch Fully Sharded Data Parallel (FSDP) APIs and can accelerate deep learning model training by up to 20%. SMP enables you to accelerate training of large models with billions of parameters by automatically partitioning and distributing the model across multiple accelerators and compute instances. You can get started with SMP in minutes and speed up your existing PyTorch FSDP training scripts with just a few lines of code.
Source:: Amazon AWS