Jump to content
Gemini now has added data protection. Chat with Gemini to save time, personalize learning and inspire creativity.
Gemini now has added data protection. Chat now.

Northeastern University runs large-scale COVID-19 epidemiological models on Google Cloud

By analyzing over 5.5 petabytes of data on Compute Engine, a team of researchers at Northeastern University ran over nine million simulations to examine the impact of travel restrictions and social distancing on the spread of the virus.

With the COVID-19 virus causing medical crises and economic devastation across the world since January 2020, the Laboratory for the Modelling of Biological + Sociotechnical Systems (MoBS Lab) at Northeastern University is running complex epidemiological models to project how travel restrictions and social distancing measures would impact efforts to curtail the pandemic. Part of the Network Science Institute at Northeastern, MoBS uses large-scale data-driven models to study the spatial spread of infectious diseases.

Under the direction of Alessandro Vespignani, Matteo Chinazzi, Jessica T. Davis, and the other members of the MoBS team worked with an international consortium of researchers from Italy, Shanghai, and the U.S. to publish their results in an article in Science on March 6, 2020. Beginning with the outbreak in Wuhan, China, they studied the spread of COVID-19 cases after travel restrictions were imposed on the region on January 23, 2020. They discovered that limiting travel in and out of China by as much as 90% would delay the onset of international cases but travel restrictions are most effective in curbing the spread of disease only when combined with social distancing measures aimed at reducing the local transmissibility of the virus.

Our team is working around-the-clock to model the effects that different containment and mitigation strategies have in affecting the spread and severity of the COVID-19 outbreak.

Dr. Matteo Chinazzi, Associate Research Scientist, Laboratory for the Modelling of Biological + Sociotechnical Systems, Northeastern

Running millions of simulations on Google Cloud makes epidemiological models faster and more accurate

Using the Global Epidemic and Mobility Model (GLEAM), a metapopulation model that combines real-world data on populations and human mobility with elaborate stochastic models of disease transmission, the team ran simulations to describe possible scenarios for the outbreak as it spreads and governments enact interventions. But running their simulations with so many epidemiological and transportation variables requires huge amounts of compute power, data processing, and storage. So the team took advantage of the integrated, scalable tools built into Google Cloud to set up their pipeline with Google Cloud Life Sciences, load each model into Google Cloud Storage, run thousands of parallel simulations on preemptible Virtual Machines (VMs) in Google Compute Engine, and then analyze them in BigQuery. With help from Google Cloud research credits, they have been able to simulate over nine million different scenarios to date and have analyzed over 5.5 PBs of simulation data in BigQuery. They also assessed the relative risk of importing cases from and within China, estimated the risk of sustained community transmission outside Mainland China, and published their results online with Google’s visualization tool, Data Studio.

Dr. Chinazzi, Associate Research Scientist at MoBS, had prior experience with a similar Google Cloud pipeline when he studied the spread of Zika virus in 2016. Then he was able to run ten million simulations, sometimes as many as 250K a day, to analyze how variables like locations, temperature, population, and travel patterns affected possible rates of infection through mosquitoes. This time his team drew on public datasets of case importations from China and private airline transportation data from the Official Aviation Guide (OAG) and the International Air Transport Association (IATA) as well as ground mobility/commuting data from Statistical Offices in more than thirty countries and 80,000 administrative regions.

Based on internationally reported cases, their models show that at the start of the travel ban from Wuhan, most Chinese cities had already received many infected travelers. The travel quarantine of Wuhan delayed the overall epidemic progression by only three to five days in Mainland China, but had a more marked effect at the international scale, where case importations were reduced by nearly 80% until mid-February. “Developing data-driven models for the spread of this infectious disease is critical,” says Dr. Chinazzi. “Our team is working around-the-clock to model the effects that different containment and mitigation strategies have in affecting the spread and severity of the COVID-19 outbreak.”

With the help of BigQuery, we are able to accelerate insights from our epidemic models and better study evolution of an ongoing outbreak.

Dr. Matteo Chinazzi, Associate Research Scientist, Laboratory for the Modelling of Biological + Sociotechnical Systems, Northeastern

Sign up here for updates, insights, resources, and more.