Jump to content

Researchers at Northeastern use Google Cloud to model Zika’s spread

Northeastern University’s Modeling of Biological and Socio-technical Systems (MoBS) lab needed a way to quickly model the Zika virus. With GCE and Preemptible Virtual Machines, MoBS has run more than 10 million simulations and drastically reduced the time needed to analyze data.

In 2015, as the mosquito-borne virus Zika quickly spread through the Americas, travel bans and quarantines were issued, as well as calls to cancel the 2016 Olympics in Brazil. As the World Health Organization declared an international public health emergency, governments in affected countries needed a way to accurately predict the rates and locations of new infections. Because only 20 percent of Zika cases are symptomatic, it is a particularly challenging virus to predict.

In January 2016, the team at Northeastern University’s MoBS lab, with the support of the Center for Inference and Dynamics of Infectious Diseases, started the Zika Modeling Project to help public authorities and researchers better understand its evolution and spread.

"With the use of big data and massive computing power, we hope to help researchers and public health officials."

Matteo Chinazzi, Associate Research Scientist, Northeastern University

Google Cloud: providing essential prediction tools, analytic tools, and more

Using a mathematical and computational approach powered by Google Cloud , the team has studied different scenarios under which Zika could spread, projecting its impact on affected populations. The model is based on the initial spread of Zika in Brazil, where the virus broke out in 2015. The researchers are now able to predict the impact of new infections in other locations by introducing additional data layers, including temperature, number of mosquitoes, population size and people’s travel patterns.

Google Cloud allows the team to run several parallel simulations, and to analyze the terabytes of data generated by the scenarios modelled. “We use several Google Cloud products,” says Matteo Chinazzi, Associate Research Scientist at Northeastern University. “Google Cloud Storage stores all of our modeling data as well as hosts the website. Google Compute Engine (GCE) and Preemptible Virtual Machines run the simulations of the disease’s spread. Google BigQuery examines the simulated scenarios, each of which involve variables, such as dates and infection numbers. So far, we’ve churned through a tremendous amount of data—hundreds of terabytes in all. Google Cloud Storage stores all of it.”

Getting results to move quickly at scale

With GCE and Preemptible Virtual Machines, MoBS has run more than 10 million simulations. GCE and BigQuery have drastically reduced the time needed to perform simulations and analyze data. (Both processes now take hours, rather than weeks.) “We have the flexibility to scale up to several thousand independent virtual instances in parallel,” he says, “so we can generate a full analysis for a single epidemic scenario—which may consist of up to 250,000 independent simulations—in less than a day.”

In addition to enabling researchers to understand the spread of Zika, this model may become a template for analyzing other epidemics, such as dengue. Although Zika is no longer an international emergency as declared by the World Health Organization, there is still work to be done in preventing outbreaks of mosquito-borne diseases. With the use of big data and limitless massive computing power, the team at MoBS hopes to help researchers and public health officials achieve that.

“Time is vital when confronting disease outbreaks,” says Chinazzi, “and Google Cloud gives us the tools we need to move quickly at scale.”

To read more about the Zika research and analysis conducted by MoBS Lab, discover “Spread of Zika virus in the Americas” published by Proceedings of the National Academy of Sciences of the United States of America.

"We have the flexibility to scale up to several thousand independent virtual instances in parallel, so we can generate a full analysis for a single epidemic scenario—which may consist of up to 250,000 independent simulations—in less than a day."

Matteo Chinazzi, Associate Research Scientist, Northeastern University

Sign up here for updates, insights, resources, and more.