Datacast Episode 70: Machine Learning Testing with Mohamed Elgendy
The 70th episode of Datacast is my conversation with Mohamed Elgendy — a seasoned AI expert, who has built and managed AI organizations at Amazon, Rakuten, Twilio, and Synapse.
Our wide-ranging conversation touches on his biomedical engineering background in Egypt, his transition from software engineering to ML, teaching computer vision, Amazon leadership principles, ML deployment on hardware devices, the state of ML testing, his current journey with Kolena, and much more.
Please enjoy my conversation with Mohamed!
Listen to the show on (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) TuneIn, (5) RadioPublic, (6) Stitcher, and (7) Breaker.
Key Takeaways
Here are highlights from my conversation with Mohamed:
On Studying Biomedical Engineering in Egypt
Originally from Egypt, I joined an engineering-oriented university after high school. After the first year, I was thinking between either computer engineering or biomedical engineering as my major. I picked biomedical because it focuses more on the applications. It is basically the combination of software and hardware engineering in the medical field.
I had a great time learning how to think like an engineer. I went to the operations room with physicians, looking at how they were doing things and figuring out ways to help them better diagnose a treatment problem. In retrospect, this experience helped me a lot when I came back to the hardware work.
On Moving To The US
After college, I worked for about 3 years in Egypt. As an engineer who likes to build things, I was not content with the medical equipment field in the Middle East. There was much less innovation on the building and manufacturing side, while much more on the selling, support, and maintenance side. I realized that if I want to be on the innovation side, I needed to leave this region and moved to the US.
During my early years in the US, I focused on getting into the “system” (immigration, logistics, employment, etc.). Instead of pursuing a Master’s in Engineering, I decided to jump into the business side by getting an MBA. This aligned with my goal of looking at the bigger picture of things, not just heads-down building things. After finishing my MBA, I worked in a few jobs with the mentality of how much I was learning and how fast I was moving. At the time, the fastest way to get into the employment system was via software engineering roles. I wasn’t picky about the industry, but the roles that I got happened to be in the medical field.
On Becoming A First-Time Author
My process of learning new things entails building and sharing. Building can be building a product, writing an article, or creating a YouTube video. This encourages me to (a) have a goal for the learning experience and (b) go back to the learning process and fill in the missing details. The implementation makes me learn things on a whole deeper level.
The topic I picked was Business Analysis. An IT business analyst is a product manager sitting between the engineers and the customers. This person understands what needs to be built, writes requirements, and liaisons between engineering and business. Even with a day job, I curved 3–4 hours every night for my side projects. At the time, I worked as a software engineer by day and learned business analysis/project management by night. I found writing books about business analysis was the most straightforward way to capture my learnings.
On Being an Engineering Manager at Twilio
Before jumping into a management role, I was a technical program manager at Yale University. I wasn’t sure that I wanted to be a people manager or not, but I went for an engineering manager opportunity at Twilio anyway. Around 2013, Twilio has become a well-known company. This was a big transition for me for various reasons: (a) moving to the Bay Area, (b) jumping into people management, and © moving into Machine Learning from software engineering.
Initially, I joined Twilio as a manager for an infrastructure team that built tools. Then, I was on a team that built ML tooling. The flagship ML product that I worked on is Twilio Understand, an NLP product that understands text sentiment and creates structured data from text. I realized that NLP wasn’t something that I was excited about.
On Amazon Culture and Leadership Principles
I would rank Amazon as the best school for people to work for, especially on the management side. As a manager, building trust and getting the team's buy-in takes a lot of work. Amazon prefers writing memos over presenting PowerPoint slides. Putting your thoughts in a document forces you to have real details and think more deeply about the problem. Overall, Amazon has been a second college for me.
Every company has its own cultural values and puts them everywhere. Amazon makes their teams eat, drink, and eat their cultural values.
Customer obsession is huge. While sitting in meetings and discussing problems, we raised the question of how to become customer-obsessed right away. Are we trying to optimize for some business leaders’ happiness or customers’ happiness?
Working backward is another big one. Starting with the customer, we work backward from that. Starting with a goal, working backward helps us eliminate the waste between us and the goal (a frugality mindset).
Additionally, Amazon has the backbone to disagree and commit. This opens the room for anyone (regardless of their rank) to voice disagreements. If we are talking, we can disagree. Once the decision (not necessarily democratic) has been made, we all commit to the decision.
These values stay with me as I continued the management path and set the culture. I hold them dear to my heart and implement them everywhere I go.
On The Benefits of Teaching
While working on the Kindle team, I wanted to move to the Computer Vision side. I talked to Amazon leadership and pitched the idea of having a Computer Vision think tank — a team of Computer Vision experts floating around several organizations and solving their problems. The idea received a positive signal, leading to the initial step of building a team of 4–5 people.
At the time, even outside of Amazon, it was hard to build an ML engineering team. I reached out to ML university within Amazon and started a 3-month course that teaches computer vision concepts to Amazon engineers. The goal is to have about 25–30 students going through the program and get the best people who are interested in joining the new team.
Personally, teaching helps me understand exactly what I am doing. If I say model architecture A is better than model architecture B, you will have to explain why A is better than B. In front of people, you will have to know why you say what you’re saying and how your opinions can hold. Having to put my thoughts in course materials pushed me to structure my thoughts in order to discuss the topics intelligently.
On its own, teaching (or writing) is not an exercise that I enjoy (ironically). But the value of teaching is tremendous as my depth of knowledge increases, as a result.
On Building Computer Vision System at Synapse
I joined Synapse right after their seed round. We built computer vision algorithms that analyze images in the X-ray machines in airport security checkpoints and highlight bounding boxes around prohibited items (guns, knives, bottles, toothpaste, etc.). There were various challenges in the axes of hardware, software, and computer vision — coupled with a high level of security and the lowest level of available infrastructure. Our products deployed in the airports are not connected to the Internet, so we faced challenges surrounding the initial deployment, the maintenance, and the upgrades. At the time, I had to roll up my sleeves and work with hardware components. This is where what I learned from college came in handy!
It wasn’t easy to transform an MVP prototype into a manufactured product that is repeatable with the same accuracy and the same quality. Instead of having 1 or 2 products every month, we needed 20 or 30 products every week. We partnered with (1) hardware vendors to collect the necessary hardware components and (2) X-ray vendors to deploy the products/provide maintenance at the customer side.
On Data Labeling Challenges
Initially, at Synapse, we had a warehouse with 6 to 7 X-ray machines. Then, we bought the actual objects that we try to detect (knives, guns, etc.) to teach our neural network. We manually scanned these objects and collected a few thousand images. Next, we trained and deployed our model. At the moment the model got deployed, we also stored the data. Within 6 months, we had millions of images. By the time that I left, that number became 25 million.
To build our model, we needed the data to be labeled. We partnered with an offshore team — who had (secured) access to our data storage, labeled our data, and brought the labeled data back to us. I always collaborated closely with the labeling team: giving them the labels, showing them examples, teaching them how to label. This process repeats time and time. The lesson that I learned is such: while the labeling process sounds simple, the more you were working to improve your model and learn about your problem, you will change your labeling strategy and re-label your data all over again.
On Incubating Kolena
After Synapse got acquired, I was thinking about a product that could solve the ML testing challenge. I started cooking something out and drawing on the whiteboard. That’s going to be the startup that I build. When the pandemic hits, I decided to pause and got a job as the VP of Engineering for the AI Platform at Rakuten. Jumping in there, I wanted to test my hypothesis on ML testing for Rakuten’s ML initiatives. The big goal was to enable AI in the organization via process, people, and infrastructure. Within the infrastructure part, I focused on building an end-to-end ML platform.
We decided to use open-source components on top of our in-house backend infrastructure. We tested several approaches to how to do ML testing. By the end of the year, we found out that we can save up to more than 50% of the experimentation time if ML testing is done right. Testing gives ML engineers specific failure modes of the model(s) and enables them to create a roadmap to fix those bugs.
Alongside my lead engineer at Rakuten (who joined from Synapse), I decided to take a leap of faith and build Kolena — a QA platform for ML in January 2021.
On ML Testing Infrastructure
When you build a model, you always want to understand the instances where your model fails. Your test set is never going to be fully representative of the real world. In the ideal case, you build and ship linearly. But in reality, there has to be some test to make sure your product passes a certain bar. As a product builder, what is your bar to say that this product is of high quality? You need metrics.
Common evaluation metrics (accuracy, precision, recall, AUC curve, etc.) are not descriptive of the failure modes. They don’t tell you what you need to do next. You basically have to shoot in the dark because you don’t know what your model has failed on (acquiring more data, developing more complex models, buying more powerful GPUs, etc.).
There are over 200 tools in the ML tooling landscape, bucketed under 3 main categories: data management, model development, and model deployment. The most viable effort to understand how models perform in production goes to model explainability. This is not a bad solution, but now there are model testing solutions that help you understand model behavior. I think the testing category is very under-served.
After talking with hundreds of ML practitioners, I noticed that ML teams are broken down into two categories:
Evaluation metrics are fine. No one pushes the engineers to test their models. I think soon enough; they will fall into the need of testing their models.
The testing process is mature. In particular, they break down their test set into small, granular slices based on specific model behavior. This practice is what I adopted at Amazon, Synapse, and Rakuten.
I believe testing is a new category that has to happen eventually. A QA tooling for ML (like Kolena) should exist right after model training and right before the model gets deployed into production.
On Writing “Deep Learning For Vision Systems”
In 2018, I was thinking about getting a graduate degree to study Computer Vision in more depth. When Manning reached out to me for a book writing opportunity, I figured this might be a more practical approach than going back to school. The entire experience includes 2 years of writing and 6 months of editing. That’s a humongous amount of effort, and I learned so much about the topic.
My favorite chapter is chapter 5, which talks about the evolution of Convolutional Neural Networks — from LeNet to ResNet and Inception. I want my reader to acquire the skill of reading research papers and distilling the most relevant bits. Chapter 5 is my way of doing it. I picked 6 ConvNets and shared my takeaways + implementations for each. I also discussed how one network improves a previous one. This chapter will help you get away from the nerve of reading and implementing research papers.
Timestamps
(01:44) Mohamed described his interest growing up in Egypt and studying Biomedical Engineering at Cairo University in the early 2000s.
(04:22) Mohamed commented on his experience moving to the US to pursue an MBA degree and working in various software engineering roles.
(07:35) Mohamed shared his experience authoring two books: (1) 3D Business Analyst: The Ultimate Hands-On Guide to Mastering Business Analysis and (2) Business Analysis for Beginners: Jump-Start Your BA Career in 4 Weeks.
(13:19) Mohamed discussed his move to the Bay Area for a Senior Engineering Manager role at Twilio, managing and shipping a series of communication API products using Machine and Deep Learning.
(17:39) Mohamed dissected engineering challenges building ML systems at Amazon, alongside key leadership lessons he acquired from managing Amazon’s Kindle mobile and ML engineering teams.
(20:50) Mohamed shared his insider perspective on Amazon’s practices of customer obsession, working backward, and disagree-to-commit.
(24:52) Mohamed mentioned the benefits of teaching a computer vision course for engineers at Amazon’s internal Machine Learning university.
(28:33) Mohamed went over the engineering (hardware + software) and ML challenges associated with building a proprietary threat detection platform at Synapse Tech Corporation (where he was the Head of Engineering).
(32:03) Mohamed shared concrete technical challenges with building an ML system that performs inference on edge devices.
(37:03) Mohamed revealed specific data labeling challenges while building the ML system at Synapse.
(39:57) Mohamed went over his one year as the VP of Engineering for the AI Platform at Rakuten, when he incubated the idea for Kolena.
(42:52) Mohamed explained the current state of ML testing infrastructure and unpacked his current project Kolena, a rigorous ML QA platform that lets users take control of their ML testing.
(49:07) Mohamed has been collaborating with a few institutions, podcasters, and ML influencers to raise awareness of the importance of ML testing and different approaches to tackle the problem.
(50:12) Mohamed touched on his side hustles working with Intel in autonomous drones and teaching content with Udacity’s AI Nanodegree programs.
(53:07) Mohamed dissected his project Mowgly, an educational platform with tracks curated by industry experts to guide users to master specific topics.
(54:58) Mohamed described his experience authoring a book with Manning in 2020 called “Deep Learning For Vision Systems.”
(58:51) Closing segment.
Mohamed’s Contact Info
Mentioned Content
People
Andrew Trask (Leader at OpenMined, Senior Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)
Francois Chollet (Senior Software Engineer at Google, Creator of Keras)
Lex Fridman (Host of the popular Lex Fridman Podcast, AI Researcher working on autonomous vehicles and human-robot interaction at MIT)
Books
Notes
My conversation with Mohamed was recorded back in March 2021. Here are some updates that Mohamed shared with me since then:
Kolena is an ML testing and validation platform that enables teams to implement testing best practices to rigorously test their models’ behavior and ship high-quality ML products much faster.
Mohamed and his team have signed a couple of big enterprise customers and raised a large seed round from top-tier investors and almost every industry leader in the AI space. These were strong signals that Kolena is solving a very important problem!
Mohamed’s first impression on the market is: the ML market is hungry for a reliable testing platform for models. Kolena has quite of a waitlist and plans to launch early next year.
About the show
Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.
Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.
Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:
If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.