In machine learning sometimes tradeoffs must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible, and the most intelligible models usually are less accurate. This can limit the accuracy of models that can safely be deployed in mission-critical applications where being able to understand, validate, edit, and ultimately trust a model is important. We have been working on a learning method to escape this tradeoff that is as accurate as full complexity models such as boosted trees and random forests, but more intelligible than linear models. This makes it easy to understand what the model has learned and to edit the model when it learns inappropriate things. Making it possible for humans to understand and repair a model is critical because most training data has unexpected problems. I’ll present several case studies where these high-accuracy GAMs discover surprising patterns in the data that would have made deploying a black-box model inappropriate. I’ll also show how these models can be used to detect and correct bias. And if there’s time, I’ll briefly discuss using intelligible GAM models to predict COVID-19 mortality.
Rich Caruana is a Senior Principal Researcher at Microsoft. His focus is on intelligible/transparent modeling, machine learning for medical decision making, deep learning, and computational ecology. Before joining Microsoft, Rich was on the faculty in Computer Science at Cornell, at UCLA's Medical School, and at CMU's Center for Learning and Discovery. Rich's Ph.D. is from CMU. His work on Multitask Learning helped create interest in a subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 (for Meta Clustering), best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), and co-chaired KDD in 2007 with Xindong Wu. "