Jump to content
Google Classroom is turning 10. Find out what we've learned and what we're doing next. 🎉
Google Classroom is turning 10. Find out what we've learned and what we're doing next. 🎉

Harvard team conducts ultra-large virtual drug screenings on Google Cloud to speed up COVID-19 research

By building an open-source tool to screen billions of chemical compounds, researchers at Harvard Medical School aim to narrow down promising targets for COVID-19 treatments and share their results on Google’s Public Datasets.

With the COVID-19 pandemic creating an urgent need for new ways to treat the virus, one team at Harvard Medical School is screening ultra-large libraries of chemical compounds to accelerate the drug discovery process. In 2015, as a doctoral student in Germany, Christoph Gorgulla developed an open source, scalable virtual drug screening platform called VirtualFlow. Now a postdoctoral research fellow at Harvard Medical School, Gorgulla is collaborating with Haribabu Arthanari, Assistant Professor in the Department of Biological Chemistry and Molecular Pharmacology, his molecular pharmacology lab, and other partners including the pharmaceutical company Enamine, to leverage VirtualFlow on Google Cloud to quickly and accurately find the best therapeutic targets for developing new treatments for COVID-19.

Computational approaches can make a big difference for drug discovery. It is our vision that VirtualFlow will become the standard virtual screening platform in the world.

Christoph Gorgulla, Postdoctoral Fellow, Harvard Medical School, Dana-Farber Cancer Institute, and the Department of Physics of Harvard.

Using cloud computing to scale and accelerate drug discovery

Developing drug therapies to prevent the spread of viruses in the body begins with the arduous process of matching a surface on the protein to accomodate a chemical molecule like pieces in a jigsaw puzzle: the exact shape of the target molecule, called a ligand, must compliment the corresponding protein receptor to block the virus from attaching there and replicating. This process is called docking, and Arthanari likens it to a unique handshake that connects the two molecular entities.

In an experimental wet lab setting, matching one potential compound after another is the tedious and time-consuming first step before any drug candidates can be developed for clinical trials. The four million patients around the world already diagnosed with COVID-19 desperately need new treatments, but the average new drug in the United States takes two to three billion dollars and about ten years to develop. By scaling up to screen huge numbers of virtual ligands, VirtualFlow can narrow down promising drug targets to accelerate that process. Gorgulla says “curing diseases quickly is most important. And if we can find a molecule that specifically blocks and unblocks the virus without disabling the human protein that gives us even better control over treatment.”

In the case of COVID-19, other scientists have already sequenced the virus, determined the three-dimensional structure and identified the human proteins that must be blocked or inhibited. “We have some advantages right now,” Arthanari says. “There’s been a flurry of structural research. We know exactly what this virus looks like and we know how its proteins operate. So we know where to target. Each protein targets multiple sites, similar to attacking each head of the Lernaean Hydra. This gives us a chance even if the virus mutates one of its head” VirtualFlow comes pre-loaded with the REAL database, Enamine’s library of over one billion commercially available on demand molecules. Arthanari says “bigger is better in virtual screening. We’ve taken this to the next magnitude with 1.4 billion compounds now. Hopefully in the near future we’ll go to about twenty billion compounds. This will revolutionize drug discovery.”

Testing one billion chemical compounds in five days

Focusing on 16 protein targets, the team used VirtualFlow to simulate dockings in 3D to search for matching blockers across the huge REAL database as well as twelve million in-stock compounds from the ZINC15 library. With an initial grant of Google research credits for the first target, the team built on the scalable processing of Google Cloud and Slurm to run 80K cores at the same time, analyzing the more than one billion compounds in five days—even while screening them multiple times to improve accuracy. At the fastest setting it would only take two or three days to process. “The Google credits provided us with amazing support to scale up the computations on Google Cloud,” says Gorgulla. ”This allowed us to carry out ultra-large virtual screenings way faster than we ever were able before.” As of May 2020, the team has targeted 16 proteins (15 viral and one human) with 40 target sites and run 35 billion docking instances using 75 million CPU hours.

Each virtual molecule tested gets a score indicating its potential as a blocker for COVID-19. Once the team identifies the hundred or so highest-scoring targets they will post them on Google’s Public Dataset site, which hosts large public datasets for educational and research uses and provides BigQuery searches for COVID-19-related research. These open source resources advance drug discovery because the pre-processing of billions of molecules is already completed and easily accessible.

This Harvard research team is already at work on their next steps. They are applying for more grants and forging partnerships in the U.S. and Europe to test experimentally the best compounds per target. For Gorgulla, “computational approaches can make a big difference for drug discovery. It is our vision that VirtualFlow will become the standard virtual screening platform in the world. Our efforts might help drug development transition more quickly to virtual approaches, which in the end will hopefully lead to more and faster cures of diseases via effective drugs with fewer side effects.”

To learn more about how Google is supporting the global fight against COVID-19, check out this central website and browse the Public Dataset, which includes a copy of the widely-referenced Johns Hopkins University CSSE COVID-19 data. If you’re working on COVID-19 research, you can apply for Google Cloud research credit.

Bigger is better in virtual screening. We’ve taken this to the next magnitude with 1.4 billion compounds now. Hopefully in the near future we’ll go to about twenty billion compounds. This will revolutionize drug discovery.

Haribabu Arthanari, Assistant Professor, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School and Dana-Farber Cancer Institute

Sign up here for updates, insights, resources, and more.