Datacast Episode 32: Economics, Data For Good, and AI Research with Sara Hooker
The 32nd Episode of Datacast is my conversation with Sara Hooker, a researcher at Google doing deep learning research on reliable explanations of model predictions for black-box models. Give it a listen to hear about her background in economics, her work at Udemy, her data for good initiatives with Delta Analytics, her research on model interpretability and model compression, the AI community in Africa, and much more.
Listen to the show on: (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) iHeart Radio, (5) Stitcher, and (6) RadioPublic
Sara Hooker is a researcher at Google AI doing deep learning research on reliable explanations of model predictions for black-box models. Her main research interests gravitate towards interpretability, model compression, and security. In 2014, she founded Delta Analytics, a non-profit dedicated to bringing technical capacity to help non-profits across the world use machine learning for good. She grew up in Africa, in Mozambique, Lesotho, Swaziland, South Africa, and Kenya. Her family now lives in Monrovia, Liberia.
Show Notes
(2:20) Sara shared her childhood growing up in Africa.
(4:05) Sara talked about her undergraduate experience at Carleton College, studying Economics and International Relations.
(9:07) Sara discussed her first job working as an Economics Analyst at Compass Lexecon in the Bay Area.
(12:20) Sara then joined Udemy as a data analyst, then transitioned to the engineering team to work on spam detection and recommendation algorithms.
(14:58) Sara dug deep into the “hustling period” of her career and how she brute-forced her way to grow as an engineer.
(17:24) Sara founded Delta Analytics — a local Bay Area non-profit community of data scientists, engineers, and economists in 2014 that believes in using data for good.
(20:53) Sara shared Delta’s collaboration with Eneza Education to empower students to access quizzes by mobile texting in Kenya (check out her presentation at the ODSC West 2016).
(25:16) Sara shared Delta’s partnership with Rainforest Connection to identify illegal de-forestation using steamed audio from the rainforest (check out her presentation at MLconf Seattle 2017).
(28:22) Sara unpacked her blog post Why “data for good” lacks precision, in which she described four key criteria frequently used to qualify an initiative as “data for good” and discussed some open challenges associated with each.
(36:34) Sara unpacked her blog post, Slow learning, in which she revealed her journey to get accepted into the AI Residency program at Google AI.
(41:03) Sara discussed her initial research interest on model interpretability for deep neural networks and her work done at Google called The (Un)reliability of Saliency Methods — which argues that saliency methods are not reliable enough to explain model prediction.
(45:55) Sara pushed the research above further with A Benchmark for Interpretability Methods in Deep Neural Networks, which proposes an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks called RemOve And Retrain.
(48:46) Sara explained why model interpretability is not always required (check out her talks at PyBay 2018, REWORK Toronto 2018, and REWORK San Francisco 2019).
(52:10) Sara explained the typical measurements of model reliability and the limitations of them, such as localization methods and points of failure.
(59:04) Sara explained why model compression is an interesting research direction and her work The State of Sparsity in Deep Neural Networks — which highlights the need for large-scale benchmarks in the field of model compression.
(01:02:49) Sara discussed her paper Selective Brain Damage: Measuring the Disparate Impact of Model Pruning — which explores the impact of pruning techniques for neural networks trained for computer vision tasks. Check out the paper website!
(01:05:08) Sara shared her future research directions on efficient pruning, sparse network training, and local gradient updates.
(01:06:56) Sara explained the premise behind her talk, Gradual Learning at the Future of Finance Summit in 2019, in which she shared the three fundamental approaches to machine learning impact.
(01:12:20) Sara described the AI community in Africa as well as the issues the community is currently facing: both from the investment landscape and the infrastructure ecosystem.
(01:18:00) Sara and her brother recently started a podcast called Underrated ML, which pitches the underrated ideas in machine learning.
(01:20:15) Sara reflected on how her background in economics influences her career outlook in machine learning.
(01:25:42) Sara reflected on the differences between applied ML and research ML and shared her advice for people contemplating between these career paths.
(01:29:49) Closing segment.
Sara’s Contact Information
Sara’s Recommended Resources
Why “data for good” lacks precision (Sara’s take on “Data for Good” initiatives)
Slow learning (Sara’s journey to Google AI)
Sanity Check for Saliency Maps by Julius Adebayo et al.
Focal Loss for Dense Object Detection by Tsung-Yi Lin et al.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew Howard et al.
Underrated ML (Sara’s new podcast)
Dumitru Erhan (Research Scientist at Google AI)
Samy Bengio (Research Scientist at Google AI)
Andrea Frome (Ex-Research Engineer at Google AI)
Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman