Tennessee team screens 1B virtual molecules in 20 hours to accelerate drug discovery for COVID-19

By combining the compute power of the Summit supercomputer and Google Cloud, researchers at Oak Ridge National Laboratory (ORNL) have developed a faster pipeline for COVID-19 drug therapies.

In February 2020, as the novel coronavirus was spreading, research facilities all across the country began rapidly transitioning their daily workloads to address the urgent demands of a global pandemic. Traditional drug discovery methods take more than ten years and over one billion dollars to develop a treatment, and many candidates fail in clinical trials. COVID-19 demanded a much faster and more efficient response. With that goal in mind, Oak Ridge National Laboratory (ORNL) in Tennessee, Google, and others became founding members of the COVID-19 High Performance Computing (HPC) Consortium, an international alliance of leading tech companies, governmental agencies, and scientific research institutions that agreed to collaborate on harnessing technology to advance innovative solutions to the disease.

For ORNL researchers like David Rogers and Jens Glaser that meant they were suddenly reassigned to a new multidisciplinary team called the COVID-19 Computational Drug Discovery Collaborative. Glaser says that “we saw the impact of COVID-19 on our daily lives and embraced the opportunity to contribute.” Computational scientists by training, Rogers and Glaser started working alongside colleagues from biology, biophysics, and materials science on developing a more efficient pipeline to accelerate the discovery of COVID-19 treatments. By virtually screening huge numbers of potential chemical compounds, the team aimed to narrow down a universe of possible drug therapies into a smaller number of viable options and share that information with experimentalists to consider testing in a wet lab.

In addition to their highly trained researchers in complementary fields, ORNL had a major advantage in pursuing this project: the IBM AC922 Summit supercomputer, housed at the Oak Ridge Leadership Computing Facility at ORNL, is the fastest supercomputer in the nation, with over 27,000 GPUs, 2M cores, and over 2.8M gigabytes of memory. ORNL biophysicist Josh Vermaas, now an assistant professor at Michigan State University, says, “the COVID-crisis has been anything but orderly, so the HPC Consortium was a focusing event to reprioritize the science that Summit would be tackling over the short term.”

Leveraging the power of HPC to scale and accelerate drug discovery--in less than a day

Starting with a grant of Google research credits, the team first worked on a prototype to deploy high-throughput virtual screening to model the 3D docking of molecular targets in a variety of configurations, or poses. They used a program called AutoDock-GPU to simulate how the virtual molecules bind with different target receptors on the protein that makes up the COVID-19 virus. Then they designed an algorithm to score each potential candidate based on the strength of its binding to the target receptor.

By running the pipeline on the Summit supercomputer, the team was able to virtually dock a billion chemical compounds fifty times faster than what was possible with existing AutoDock software. Jeremy Smith, project lead and director of the Center for Molecular Biophysics at ORNL, was impressed by the results: “we broke a world record on the supercomputer. We screened 1.3 billion compounds in less than a single day.” By running the workload on Summit and Google Cloud GPUs in parallel, they were able to carry out computational screenings against multiple COVID drug targets. “We didn't just dock 1.3 billion compounds,” Rogers adds, “we docked 1.3 billion compounds against two separate drug targets on Summit and two more drug targets on Google Cloud.”

"We broke a world record on the supercomputer. We screened 1.3 billion compounds in less than a single day."

Jeremy Smith, Director of the Center for Molecular Biophysics, Oak Ridge National Laboratory

Collaborating to advance breakthroughs during a crisis

Rogers reports that by dynamically modelling more poses and using machine learning to cross-check screening scores, the ORNL team also made their results more robust. The team’s first batch of 323 top-scoring compounds has been sent for experimental testing, and they are already working on a second batch. “By drawing on the expertise of biologists, docking specialists, and computational experts, we can optimize the pipeline itself,” Rogers says. “The biologists have given us great targets to go after. It’s not just about the code but about how you apply it. We hope this is the beginning of even more collaboration.”

The team has plans for their optimized pipeline, like re-testing compounds as the COVID-19 virus evolves over time. ORNL computational biophysicist Ada Sedova notes that “it could be possible that as a protein mutates then the drugs we’re using may not bind as well any more. We want to look ahead and incorporate that into a drug discovery pipeline that predicts possible drug resistance.” ORNL is also gearing up for the arrival of exascale computing when Frontier, which is poised to be the world’s largest supercomputer, is launched in 2021.

The project, like the COVID-19 HPC Consortium, is part of a broader effort to bring scientific breakthroughs to the public faster. "This pandemic has actually changed how science is done,” says Thomas Zacharia, Director of ORNL. “It’s become much more collaborative both internal to the laboratory, within the laboratory system, but also across agencies and industry. So it’s truly a testament to mobilizing the best capabilities, people, program resources and capabilities together to defeat this virus.”

To find out more about ORNL’s opportunities for collaboration and training, see here. To learn more about how Google is supporting the global fight against COVID-19, check out this central website and browse the Public Dataset, which includes a copy of the widely-referenced Johns Hopkins University CSSE COVID-19 data.