Published in Nature Machine Intelligence, a panel of experts shares a vision for the future of biopharma featuring collaboration between ML and drug discovery powered by GPUs.
The field of drug discovery is at a fascinating inflection point. The physics of the problem are understood and calculable, yet quantum mechanical calculations are far too expensive and time consuming. Eroom’s Law observes that drug discovery is becoming slower and more expensive over time, despite improvements in technology.
A recent article examining the transformational role of GPU computing and deep learning in drug discovery is showing hope that this trend may soon reverse.
Published in Nature Machine Intelligence, the review details numerous advances in challenges from molecular simulation and protein structure determination to generative drug design that are accelerating the computer-aided drug discovery workflow. These advances, driven by developments in highly parallelizable GPUs and GPU-enabled algorithms, are bringing new possibilities to computational chemistry and structural biology for the development of novel medicines.
The collaboration between researchers in drug discovery and machine learning to identify GPU-accelerated deep learning tools is creating new possibilities for these challenges that if solved, hold the key to faster, less expensive drug development.
“We expect that the growing availability of increasingly powerful GPU architectures, together with the development of advanced DL strategies, and GPU-accelerated algorithms, will help to make drug discovery affordable and accessible to the broader scientific community worldwide,” the study authors write.
Molecular simulation and free energy calculations
Molecular simulation powers many calculations important in drug discovery and is the computational microscope that can be used to perform virtual experiments using the laws of physics. GPU-powered molecular dynamics frameworks can simulate the cell’s machinery lending insight into fundamental mechanisms and calculate how strongly a candidate drug will bind to its intended protein target using calculations like free energy perturbation. Of central importance to molecular simulation is the calculation of potential energy surfaces.
In the highlighted review, the authors cover how machine-learned potentials are fundamentally changing molecular simulation. Machine-learned or neural network potentials are models, which learn energies and forces for molecular simulation with the accuracy of quantum mechanics.
The authors report that free energy simulations benefit greatly from GPUs. Neural network-based force fields such as ANI and AIMNet reduce absolute binding free-energy errors and human effort for force field development. Other deep learning frameworks like reweighted autoencoder variational Bayes (RAVE) are pushing the boundaries of molecular simulation, employing an enhanced sampling scheme for estimating protein-ligand binding free energies. Methods like Deep Docking are now employing DL models to estimate molecular docking scores and accelerate virtual screening.
Advances in protein structure determination
Over the last 10 years, there has been a 2.13x increase in the number of protein structures publicly available. An increasing rate of CryoEM structure deposition and the proliferation of proteomics has further contributed to an abundance of structure and sequence data.
CryoEM is projected to dominate high-resolution macromolecular structural determination in the coming years with its simplicity, robustness, and ability to image large macromolecules. It is also less destructive to samples as it does not require crystallization.
However, the data storage demands and computational requirements are sizable. The study’s authors detail how deep learning based approaches like DEFMap and DeepPicker are powering high-throughput automation of CryoEM for protein structure determination with the help of GPUs. With DEFMap, molecular dynamics simulations that understand relationships in local density data and deep learning algorithms are combined to extract dynamics associated with hidden atomic fluctuations.
The groundbreaking development of AlphaFold-2 and RoseTTAFold models that predict protein structure with atomic accuracy is ushering in a new era structure determination. A recent study by Mosalaganti et al. highlights the predictive power of these models. It also demonstrates how protein structure prediction models can be combined with cryoelectron tomography (CryoET) to determine the structure of the nuclear pore complex, a massive cellular structure comprised of over 1,000 proteins. Mosalagneti et al. go on to perform coarse-grained molecular dynamics simulations of the nuclear pore complex. This gives a glimpse into the future of the kinds of simulations made possible by the combination of AI-based protein structure prediction models, CryoEM and CryoET.
Generative models and deep learning architectures
One of the central challenges of drug discovery is the overwhelming size of the chemical space. There are 1060 drug-like molecules to consider, so researchers need a representation of the chemical space that is organized and searchable. By training on a large base of existing molecules, generative models learn the rules of chemistry and to represent chemical space in the latent space of the model.
Generative models, by implicitly learning the rules of chemistry, produce molecules that they’ve never seen before. This results in exponentially more unique, valid molecules than in the original training database. Researchers can also construct numerical optimization algorithms that operate in the latent space of the model to search for optimal molecules. These function as gradients in the latent space that computational chemists can use to steer molecule generation toward desirable properties.
The authors report that numerous state-of-the-art deep learning architectures are driving more robust generative models. Graph neural networks, generative adversarial networks, variational encoders, and transformers are creating generative models transforming molecular representation and de novo drug design.
Convolutional neural networks, like Chemception, have been trained to predict chemical properties such as toxicity, activity, and solvation. Recurrent neural networks have the capacity to learn latent representations of chemical spaces to make predictions for several datasets and tasks.
MegaMolBART is a transformer-based generative model that achieves 98.7% unique molecule generation at AI-supercomputing scale. With support for model parallel training, MegaMolBART can train 1B+ parameter models for training on large chemical databases and is tunable for a wide range of tasks.
The Million-X leap in scientific computing
Today, GPUs are accelerating every step of the computer aided drug discovery workflow, showing effectiveness in everything from target elucidation to FDA approval. With accelerated computing, scientific calculations are being massively parallelized on GPUs.
Supercomputers help these calculations to be scaled up and out to multiple nodes and GPUs, leveraging fast communication fabrics to tie GPUs and nodes together.
AT GTC, NVIDIA CEO Jensen Huang shared how NVIDIA has accelerated computing by a million-x over the past decade. The future is bright for digital biology, where these speed-ups are being realized to speed up drug discovery and deliver therapeutics to market faster.