Introducing distributed training on Amazon SageMaker

Today we are introducing Amazon SageMaker distributed training, the fastest and easiest methods for training large deep learning models and datasets. Using partitioning algorithms, SageMaker distributed training automatically splits large deep learning models and training datasets across AWS GPU instances in a fraction of the time it takes to do manually. SageMaker achieves these efficiencies through two techniques: model parallelism and data parallelism. Model parallelism splits models too large to fit on a single GPU into smaller parts before distributing across multiple GPUs to train, and data parallelism splits large datasets to train concurrently in order to improve training speed.

Source:: Amazon AWS