Using Google’s machine learning tools, UNC team screens for dental disease in young children

By incorporating a wide range of variables, dental researchers are developing a screening tool for Early Childhood Caries (ECC).

One in three preschool age children in the United States has experienced ECC, a microbiome- and diet-driven disease causing tooth decay or “cavities” in children under the age of 6. In 2015 Dr. Kimon Divaris, Professor at the University of North Carolina’s Adams School of Dentistry, launched a five-year NIH-funded study (NIDCR U01-DE025046) called Zero Out Early Childhood Tooth Decay (ZOE 2.0) to investigate the biological and social factors causing the disease. “We don’t know yet the genomic basis of individual susceptibility to ECC,” Divaris says. “But we do know ECC is sugar-driven and affects underserved children disproportionately.” His study collected a wide range of data from over 8,000 North Carolina children aged 3 to 5, including over 6,400 clinical examinations, and combined them with data from a parent questionnaire, publicly available data on geographic area-level variables, as well as human genomics and oral microbiome data. By tracking up to 2,000 variables—from fluoride in the local water to dietary habits and detailed characterizations of individual tooth surfaces—the ZOE 2.0 study is a major assessment of the state of children’s oral health.

Pioneering machine learning for dental research and clinical innovation

When it came time to analyze the data, Dr. Deepti Karhade, a second-year pediatric dentistry resident working with Dr. Divaris, drew on her experience with machine learning to propose using algorithms to classify and analyze the dataset. Which variables and which models, she wondered, would best predict ECC in young children? To store and analyze all these data she and Divaris chose Google Cloud and AutoML, Google Cloud’s cutting-edge technology for building custom machine learning models. Karhade reports that “we chose AutoML Tables primarily because of its automated architecture search, statistical learning features, and speed.” “It’s a great, easy to use platform that enabled us to build a foundation for future studies.” Divaris adds, “it’s a complete package. It can host our data and models on the cloud, deploy them, and make them shareable.” Although their data were all de-identified, by using Google Cloud the team also leveraged Google’s rigorous privacy and security protocols. When finished, they can easily make their analyses publicly available so other researchers can pursue their own investigations using their data and models.

Karhade designed her study by modelling scenarios with a range of demographic, behavioral, and clinical variables, from children’s age and parent’s education to domestic water fluoride content and frequency of consuming sugary between-meal snacks and beverages. She uploaded ZOE 2.0’s massive dataset into AutoML Tables and quickly produced statistics for each model. Surprisingly, her results showed that the best classification model was also the simplest. Inputting a child’s age and a parent’s report on the child’s oral health, based on a five-step scale ranging from Poor to Excellent, generated the most accurate predictions for the child having ECC. Karhade externally replicated her results using the National Health and Nutrition Examination Survey (NHANES) clinical dataset.

Developing easy to use tools for oral health

Her colleagues took notice. Karhade’s research won a Graduate Student Research Award and then the Ralph E. McDonald Award, recognitions for the best research project presented by a graduate student at the American Academy of Pediatric Dentistry’s annual conference. “I’m honored to receive these awards,” Karhade says. Divaris notes that “these results have practical implications: parental assessments are relatively inexpensive to obtain, and although they do not replace clinical visits, they offer a convenient screening alternative.” In fact, Divaris and Karhade are developing a web-based screening tool where users can provide the algorithm with child oral health information and generate ECC status predictions. According to Karhade, they plan on refining the current screening algorithm to a risk assessment tool in the future. Such risk assessment algorithms will consider behavioral, biological, genomic, and other -omics data to estimate risk (i.e., the probability of developing disease in the future). They are also exploring the use of other Google Cloud tools, like AutoML Vision, which could help them easily build custom models for images. The screening and risk assessment algorithms could eventually be augmented by photographs of children’s dentitions that parents take and upload to a web-based screening tool for processing.

For Karhade and Divaris, this approach shows enormous promise. According to Karhade, “While not ready for clinical use, the model can be used to immediately impute ECC status in existing public health or administrative datasets and enable analyses that would otherwise be impossible to accomplish. Importantly, it will be improved with the addition of new data.” Divaris emphasizes the importance of translating data-informed insights into clinical recommendations: “What we are most excited to introduce is the consideration of genomics and microbiome information jointly with clinical observations, behaviors, and social context. We envision that the next generation of oral health professionals will use these tools and that this will ultimately lead to better oral health for all.” Together, Karhade’s machine learning project and Divaris’s ZOE 2.0 study are bringing big data analysis of large-scale cohorts to the emerging field of precision dentistry.

"Caries risk assessment and other classification tools exist, but this is the first using machine learning. We believe it will enhance the use of public data and the public good."

Dr. Deepti Karhade, second-year resident, Adams School of Dentistry, University of North Carolina