When disaster strikes, emergency agencies must be able to mobilize resources and protect communities at a moment’s notice. Roads need to remain clear for first responders and evacuation routes, regardless of extreme weather or changing conditions. Stationery cameras already in place can monitor traffic patterns and send updated information to a data center, but analyzing that data on demand requires reliable, high-speed, high-capacity computing resources—especially in life-threatening situations.
Since high performance computing (HPC) in the cloud manages scalable workloads efficiently, it could be an ideal solution to the urgent need for disaster management, but it had never been tested. Brandon Posey, then a Ph.D. student in Computer Science at Clemson University, set out to design an experiment to explore using virtual machine instances (VMs) to deploy very large-scale emergency computations in the interest of public safety.
“We want to leverage the instant scalability of the commercial cloud to run workloads that would normally require a massive supercomputer.Amy Apon, Professor and C. Tycho Howle Director of the School of Computing, Clemson University
Setting a new record for the scale of cloud computation on a single problem: 2.1 million cores
Posey and his team set up a ten-day experiment that defined an evacuation zone on interstate highways in the southeastern United States and collected anonymous streaming traffic data from 8,500 camera feeds throughout the area. Over the course of the experiment that added up to 2 million hours of video. Using a commercial traffic analytics software program designed by TrafficVision, they processed the video in segments on Google Cloud Platform (GCP), optimizing the configuration of each virtual node with 16 VCPU cores and 16GB of memory.
To provision the workflow, Clemson used their own platform-agnostic tool PAW together with CloudyCluster’s GCP interface to schedule the job with SchedMD’s open-source Slurm workload manager. They distributed the job in parallel over 93,000 VCPUs to 2 million cores, which meant managing capacity limits by combining quotas across six regional GCP data centers and then batching their API requests to regulate the flow of demand. The experiment showed that they could successfully scale up to 2.14 million VCPUs executing in parallel: 967,000 VCPUs were running at the end of the first hour, 1.5 million in 1.5 hours, and 2.13 after 3 hours, with a peak of 133,573 concurrent VM instances.
210 TB of video data processed in under eight hours
Because the processing could be done in small batches in any order, the team could also take advantage of Google Cloud’s preemptible VMs. These are offered with a discount of up to 80% and may have service interruptions, making them a cost-effective choice for this custom workload. The team was able to process 210 TB of video data in about eight hours, at an average cost lower than many on-premise HPC systems. Amy Apon, Professor and C. Tycho Howle Director of the School of Computing at Clemson, supervised Posey’s project and explains that “we want to leverage the instant scalability of the commercial cloud to run workloads that would normally require a massive supercomputer. The ability to scale up is important, but scaling down makes it cost effective. We envision that this type of system can be utilized to aid in planning evacuation processes by allowing first responders or Departments of Transportation to observe and simulate traffic in real time during a major event.”
The team learned that a GCP-based parallel-processing system can be launched and then deprovisioned in just a few hours, and that it can efficiently process data at a lower cost than many on-premises HPC systems. “Most hurricane evacuations won’t require the full 2-million-plus virtual CPUs used here to process evacuation data in real time, but it’s reassuring to know that it’s possible,” says Kevin Kissell, Technical Director for HPC in the Office of Google’s CTO. “And it’s a source of some pride to me that it’s possible on Google Cloud.” For Apon, this experiment also confirms how cloud computing is democratizing access to technology: “it’s the advent of a new age in parallel supercomputing. Now even a researcher at a small college or business can run a parallel application in the cloud.”
For more technical details about Clemson’s experiment, see this blog post.
Most hurricane evacuations won’t require the full 2-million-plus virtual CPUs used here to process evacuation data in real time, but it’s reassuring to know that it’s possible.Kevin Kissell, Technical Director for HPC, Google office of the CTO