AWS Neuron introduces speculative decoding

Today, AWS announces the release of Neuron 2.18, introducing stable support (out of beta) for PyTorch 2.1 and adding support for speculative decoding with Llama-2-70B sample in Transformers NeuronX library.

Source:: Amazon AWS