The 131st episode of Datacast is my conversation with Krishna Gade, the founder and CEO of Fiddler AI, an AI Observability startup that helps AI-forward organizations build trusted AI solutions and connect model outcomes to business KPIs.

Our wide-ranging conversation touches on his research on document clustering in grad school, his early career working on the Bing’s Search Engine at Microsoft, his time as an engineering leader at Twitter and Pinterest scaling their data engineering, his experience at Facebook building the News Feed ranking platform, the founding story of Fiddler AI and the Model Performance Management framework, model governance for modern enterprises, lessons learned from hiring/finding design partners/fundraising, and much more.

Please enjoy my conversation with Krishna!

Listen to the show on (1) Spotify, (2) Google, (3) Deezer, (4) TuneIn, and (5) iHeartRadio

~ New Pod ~#Datacast E131 features @krishnagade. We discussed:
- Scalable Data Engineering
- Algorithmic Governance
- Responsible AI
- Model Performance Management

Krishna is bringing AI Observability to the enterprise with @fiddlerlabs. Enjoy! pic.twitter.com/zuGM8NVia7
— James Le (@le_james94) January 24, 2024

Key Takeaways

Here are the highlights from my conversation with Krishna:

On His Educational Background

Like many Indian students who come to America, I completed my undergraduate studies in the late nineties and then pursued graduate school in the U.S. One area that particularly interested me was data mining.

During the late nineties and early 2000s, significant research was conducted in association rule mining. One of the seminal papers in this field focused on analyzing transaction data from Walmart and uncovering exciting patterns.

I vividly remember one of the most intriguing discoveries from that research, which involved a humorous example. The researchers found a strong correlation between purchases of beer and diapers, which initially puzzled them. They wondered why such a high correlation existed between these two items. It turned out that many fathers, on their way back home, would stop by Walmart to buy beer for themselves and diapers for their children. This finding was quite fascinating.

As a result, Walmart rearranged its aisles based on this insight to help customers find items more efficiently. These techniques have many applications, especially in web and text mining. I focused extensively on document clustering, which was crucial in the early stages of search engines like Google, Yahoo, and Bing. Teams were crawling the web and striving to make sense of the vast amount of textual data available. I dedicated a significant amount of time to developing clustering algorithms that could effectively analyze these large text datasets.

The University of Minnesota has an excellent computer science program. I was fortunate to work with two professors who were pioneers in data mining and graph partitioning: Dr. Vipin Kumar, who is still at the university, and George Karypis, who is currently on hiatus and working at Amazon. They conducted groundbreaking research in bioinformatics, data mining, text mining, and climate modeling. Working with them gave me valuable opportunities to collaborate, publish papers, and gain experience in these fields. Overall, it was an incredible experience.

On Working on Bing's search engine at Microsoft

After working in grad school on data mining-related areas, I wanted to work on building a web search system. Microsoft had a small team of 25 people dedicated to this project, which was like a startup within the company. The goal was to create a search engine to compete with Google.

Joining this team was an exciting opportunity for me as we started from scratch. We were all new and figuring things out together. During that time, everything was new. Microsoft used Linux servers for their data centers, including Hotmail. Bing, one of the first software systems built on clusters of Windows 2003, was also part of this infrastructure. Eventually, this infrastructure was repurposed for Azure.

Working on Bing, I was part of the search quality team, which allowed me to contribute to groundbreaking work. We focused on building page rank and graph-based algorithms to determine the importance of web pages. Additionally, I had the opportunity to work on Bing's first autocomplete feature, which provided quick suggestions for users when they used the search box.

We also worked on innovative projects, such as returning search traffic to Bing. We had many MSN pages and properties and wanted to run contextual ads similar to AdSense. We called it SearchSense, where we displayed relevant ads for search queries on Bing. For example, if you were on an MSN page related to Michael Jordan, you could see a query that would bring you back to Bing search.

Working on search engines allowed me to explore various aspects of computer science, including distributed systems, machine learning, and algorithm development. It was a valuable experience that laid the foundation for my career. I spent around five to six years at Bing and cherish the knowledge and skills I gained.

On Competing Against Google

Even though Bing and Microsoft Research invented cutting edge Algorithms such as Neural Networks for Search Ranking, we came 2nd best because Google had better data to address the long tail search quality. /4https://t.co/zhQldxzpEC
— Krishna Gade (@krishnagade) December 29, 2020

Google's biggest advantage over Bing was the amount of data they had. This was because they already had a toolbar launched on various browsers, allowing them to collect a large amount of long-tail query data.

In the case of search engines, the more you use them, the better they can become because they can gather more data. At Bing, we developed highly sophisticated algorithms, including one of the first neural networks for a large-scale use case.

Before that, we also productized an algorithm called RankNet, a two-layer neural network used to score search results on web pages. We had a team of Microsoft researchers working on these problems. However, Google had better data due to their larger amount of long-tail query data. This made it challenging for Bing to match or surpass Google regarding search quality.

We constantly compared our performance to Google using a search quality metric called NDCG (Normalized Discounted Cumulative Gain). We aimed to see how Bing performed compared to Google for the same queries.

On Scaling Twitter Search

After spending five to six years at Bing Search, I pursued my dream of working in startups and moving to the Valley. I had the opportunity to join a small team of about a hundred people working on Twitter.

The team was focused on building and scaling a search engine, as the existing search engine was limited and lacked the necessary features and functionality. Joining the search team meant being part of an opportunity to rebuild the search engine from the ground up.

At the time, Twitter's search functionality was very basic. All the tweets were stored in a MySQL database, and a Ruby on Rails app would query the database using simple text queries. However, this approach was not scalable as Twitter grew in terms of users and traffic.

I was hired, along with a few others, to build a real-time search engine. We introduced Lucene, an open-source search engine, within Twitter to achieve this. Earlybird, on the other hand, was a real-time implementation of Lucene. Lucene was great for batch processing data, but we needed to make it work for real-time ingestion, which was crucial for Twitter as a real-time product.

Luckily, we had a Lucene committer on our team who made significant progress in getting the core parts of Lucene to work. Once we had the foundation in place, we focused on scaling the system to handle a variety of search queries. We developed different ways of blending search results, such as filtering tweets from friends or scoring documents based on relevance and time order.

I took on the tech lead role for the Blender project, which aimed to blend search results seamlessly. Additionally, the Ruby on Rails front end was experiencing scaling issues, especially during a period of high traffic after the tsunami in Japan. To address this, we developed a JVM-based application called Blender, which provided high scalability and the ability to blend search results using customizable workflows.

Developers could easily compose queries by calling the Earlybird service, passing the results to the ranker, and filtering the user network from the search results. This simplified workflow made it easier for developers to write search queries. These were some of the projects I worked on at Twitter.

Furthermore, I had the opportunity to work on building a Typeahead feature similar to Bing's, which provided auto-completion functionality on Twitter.

On Processing Streaming Data At Scale at Twitter

One of the major differentiators of Twitter was its real-time nature. It had a unique selling point compared to other media networks at the time, as it processed streaming data through tweets.

To handle the massive amount of data, we needed special infrastructure. After spending a few years in search, I had the opportunity to build a data infrastructure team focused on the backend side. We aimed to create a streaming infrastructure to process these real-time tweets and enable various use cases.

These use cases included trending topics, creating features for ad scoring, and real-time analytics. We had machine learning models running to score ads and wanted to ensure we provided the freshest data for them. We also wanted to measure the performance of our ads and tweets for analysis purposes.

During my time at Twitter, we acquired a small startup that was working on an open-source project called Apache Storm. We integrated this project into our infrastructure and built a team around it. We successfully implemented it for various use cases within Twitter. Towards the end of my time there, we were even working on rebuilding the next version of Storm, rewriting it in a different language.

We also worked on key-value stores and distributed caching systems for the online data infrastructure. These systems helped our product teams quickly serve tweets, users, and other data types.

Twitter greatly benefited from open-source technologies like Lucene, Storm, and Memcache. We also actively contributed to the open-source community with projects like Mesos, which served as an orchestrating system before Kubernetes became widely used. Storm Finagle and RPC library were also developed to orchestrate requests between JVM services. Twitter strongly focused on open source and actively contributed to its development.

On Leading Data Engineering at Pinterest

During that time, the concept of big data became extremely popular. Every company was trying to get involved in big data. But how do we make sense of big data? How do we create infrastructure to process and analyze this data?

After gaining experience building data infrastructure at Twitter, I had the opportunity to start from scratch again with a data infra team at Pinterest. Pinterest was a unique product that combined elements of a social media network and a search engine. It was often called a "discovery engine" where users could find ideas and things they were interested in. Data played a central role in this product because to show users new pins and recommendations, we needed to mine the data and generate insights from it.

It was a great role for me and our team at Pinterest. Despite being a small team, we could partner with different startups and leverage AWS as a cloud provider, which gave us an advantage compared to my previous experience at Twitter. This gave me the opportunity to learn how to create data architecture and build data infrastructure from scratch. We were early adopters of technologies like Kafka and Spark, and we had interactions with the founding teams of Confluent and Databricks. Additionally, Snowflake came onto the scene during that time. We also worked on productizing Presto, Facebook's data analytics engine. It was exciting as we built the necessary data infrastructure to meet the company's needs.

On His Key Initiatives at Pinterest

Source: https://medium.com/pinterest-engineering/real-time-analytics-at-pinterest-1ef11fdb1099

For real-time analytics, we had to build everything from scratch. This included our logging agent, Singer, which ran on every web server at Pinterest. Since Pinterest was a Python shop, we had Python web servers that served user traffic and logged data to Singer. Singer was a JVM agent that published all the data into a centralized Kafka infrastructure. From there, we fed the data into real-time databases such as MemSQL, Redshift, and HBase to compute real-time insights.

Source: https://medium.com/pinterest-engineering/building-pinterests-a-b-testing-platform-ab4934ace9f4

One crucial insight we focused on was experimentation. From the ground up, we built an A/B testing framework, which acted as a mix of a search engine and social network. Every change we made, whether a UI or algorithm change, had to be thoroughly tested on a subset of users. Testing played a significant role, and our team supported the testing framework. We needed to quickly determine if an experiment was configured correctly or not to avoid exposing it to a large population.

Real-time analytics was crucial for us to promptly identify and roll back experiments that generated inaccurate data. Additionally, we provided ETL as a service to our analysts, who answered insights about entering new markets. They ran high and trust queries, which turned into ETL jobs to create reports. We developed an end-to-end analytics infrastructure called Pinalytics, where users could start with a simple query, perform ad hoc analysis, and use ETL as a service to generate reports.

Source: https://medium.com/pinterest-engineering/skyline-etl-as-a-service-a441efdeeb90

This analytics infrastructure included a front-end report that displayed the results beautifully. The experience gained from building this infrastructure at Pinterest has been valuable in developing Fiddler. At Fiddler, we focus on helping ML engineers and data scientists create and share reports on ML within the organization, leveraging the knowledge and expertise gained from our work at Pinterest.

On Building The Ranking Platform for Facebook's News Feed

I was an eng leader on Facebook’s NewsFeed and my team was responsible for the feed ranking platform.

Every few days an engineer would get paged that a metric e.g., “likes” or “comments” is down.

It usually translated to a Machine Learning model performance issue. /thread
— Krishna Gade (@krishnagade) February 11, 2021

After spending time at Pinterest, I wanted to be closer to where I live. I didn't want to continue doing a long commute to the city for about seven years. Therefore, I took up a really good opportunity at Facebook. It was a chance to build a team from scratch.

At Facebook, the newsfeed had a large ranking team of ML engineers who built models to score the newsfeed. However, they lacked good tools to understand how these models worked and to monitor and experiment with them.

I was responsible for building a ranking platform team to create tools and infrastructure for the newsfeed ranking engineers. When we started surveying the problems faced by the engineers, one of the major issues was the need to spend hours debugging whenever someone within Facebook complained about newsfeed quality. They had to figure out whether it was a data, model, or feature issue.

Thus, our first project was the development of a feed debugger. This debugger allowed developers to understand how their machine learning models were functioning. It helped them identify if a stale model was running, if a bad experiment was configured, if bad data was flowing into the system, or even provided insights on which features scored higher or lower for specific stories. This debugger greatly aided in understanding and improving feed quality.

I was an engineer on Facebook's News Feed and this is NOT how recommender systems work.

While users can set some explicit preferences, implicit user activity on the app is the bulk of the signal that gets fed into the AI systems which control & rank the feed. /thread https://t.co/93wOXZ2wSt
— Krishna Gade (@krishnagade) October 7, 2021

What started as a project by an intern eventually grew into a big team dedicated to feed quality. We also had an operations perspective, which helped us handle the numerous complaints and requests for feed quality improvement that came from every Facebook employee who was also a user. We prioritized and fixed the real bugs, making a significant impact on feed quality.

Apart from that, there were other requirements, such as the ability to configure and set up experiments, as Facebook was a continuously developing company with a lot of velocity in feature deployment. We built infrastructure to quickly analyze these experiments and determine whether to ship an A/B test. We also created dashboards to share experiment results within launch reviews, allowing executives and the News Feed leadership to see which experiments were successful.

These were some of the projects my team was working on, which ultimately led me to become involved in explainability and model monitoring.

On Algorithmic Governance at Facebook

What happened with Facebook is that they developed numerous machine learning models. They started with a simple logistic regression model for the newsfeed in 2012. Within four years, this evolved into ensembles of large models. They began using sparse neural networks and complex deep learning models to score the newsfeed and generate new recommendations. This led to constant questions like "Why am I seeing this story?" and "Why is this story going viral?"

I was at Facebook when we bore the brunt of these issues during that time, and the feeling was like we were caught off-guard. We started working on putting guard-rails to our AI/ML algorithms, checking for data integrity, building debugging, and diagnostic tooling. /8
— Krishna Gade (@krishnagade) December 29, 2020

Before building these tools, debugging and answering these questions was challenging. One of the tools created was called "Why am I seeing this?" which was launched for every Facebook employee. It allowed users to debug their own newsfeeds by examining the reasons behind suggested content. If users disagreed with the suggestions, they could file a ticket with the feed quality team for review.

This process of algorithmic transparency empowered newsfeed and Facebook employees to provide feedback, ultimately improving feed quality for a period of time. Initially launched internally, the tool was eventually made available to end users as well. It provided human-readable explanations for why a user saw specific newsfeed content, such as a cat picture or another story.

Tools like “Why am I seeing this’’ started to bring much needed Algorithmic transparency and thereby accountability to the Newsfeed for both internal and external users. /11
— Krishna Gade (@krishnagade) December 29, 2020

This tool was the industry's first high-scale explainability tool. Although Facebook and Google could build such tools, not every company had the resources or ability to do so. This motivated the start of Fiddler. The situation Facebook was facing as a company, including data privacy issues, made these tools crucial. They had significant visibility, even reaching the chief product officer.

This is why we felt the need to bring similar capabilities to the enterprise and embarked on our journey with Fiddler. Given the context of Facebook at that time, there are many other implications to consider. But, these tools played a vital role and drove our commitment to creating transparency and explainability in AI systems.

On Leadership Lessons from Twitter, Pinterest, and Facebook

Later on, I spent a decade working at social networking companies like Twitter, Pinterest, and Facebook. The one common theme that stayed true in my jobs was the importance of collecting, analyzing, and processing large volumes of data to make our products better. /6
— Krishna Gade (@krishnagade) December 29, 2020

The common theme among all these companies is the scale of data. In some ways, all of these companies are building data products. For example, Twitter has its search feature, Pinterest has its home feed, and Facebook has newsfeed ads. They are all considered data products.

One common theme is the need to invest heavily in infrastructure that can process and clean the data, ensuring it is available to a large audience in an efficient manner. There should also be guidelines in place for the proper use of the data. Ultimately, the goal is to make sense of the data and facilitate the development of tools for mining insights or creating new applications.

A leadership lesson here is the importance of establishing a strong foundation. Investing in infrastructure for data and machine learning is crucial in empowering business and product teams. Notably, these three companies achieved massive success, particularly Twitter and Pinterest, which I joined in their early stages.

On The Founding Story of Fiddler AI

Source: https://www.fiddler.ai/blog/introducing-fiddler-labs

As you may have noticed, I have a lot of experience building backend infrastructure and assisting internal developers, data scientists, and analysts to be more productive and efficient. That has been a common theme throughout my career. Naturally, I always wanted to start an enterprise startup where I could apply the lessons I've learned and build a platform that can be used by many developers, data scientists, and analysts.

However, I was waiting for the right problem that would motivate me to start my own company. Starting a startup is challenging; one must be prepared to take risks and go through a rollercoaster journey. While working on explainable AI at Facebook, I encountered a technically challenging problem. How can we explain complex machine learning models? Currently, our understanding is still quite limited.

Unless we foster transparency, fairness, and accountability, in this new decade we can't ensure Algorithmic Justice for all.

This is the reason, we founded @fiddlerlabs to work towards a mission of Building Trust in AI and we need all your support! /endhttps://t.co/OndOj8dpFu
— Krishna Gade (@krishnagade) December 29, 2020

Additionally, how can we make these explanations and create a product that has a social impact? In many enterprise companies, you may build impressive databases or networking infrastructure, but the social impact is often several steps removed. In this case, by making AI trustworthy, Fiddler can directly make a positive impact on society. I was excited about the social impact of this work.

Another crucial aspect is the ability to create a viable business. People should need and be willing to pay for the product. Based on my experience with explainable AI, this problem appeals not only to developers but also to executives. C-level executives often view advances in AI, such as ChatGPT or the Stable Diffusion or GPT3 use cases, as black magic. They are unsure about AI's inner workings and potential risks. Fiddler saw an opportunity to address these challenges and create a business around them.

These three factors have motivated me to start Fiddler and embark on this journey for the past four years.

On His Co-Founder Amit

First up, the startup journey is not for the faint-hearted. The highs are super high and lows are super low. I am lucky to have got a great co-founder in @amitpaka who has been there throughout this journey.

So, if you’re thinking of getting a co-founder - you should! /2
— Krishna Gade (@krishnagade) January 20, 2022

When I started the company, I wanted to find someone with a complementary background, someone I already had established trust with. Luckily, Amit and I have known each other for nearly 20 years.

We were classmates in grad school and worked together at Microsoft. I've known him for a long time, and he has previous startup experience. Since I lacked prior startup experience, especially in the early stages of founding a company, I wanted someone who had been through it before. Amit had successfully founded two startups, with good exits from PayPal and Samsung.

He was the perfect fit because he brought operational experience and expertise in building startups. It was also essential to have someone I could rely on throughout the journey, as startups can be chaotic. Trust and dependability are crucial qualities in a partner.

Amit's background in product development and operations made him an excellent fit. He served as the product leader for the first three years of Fiddler, building the initial versions of the product. Recently, he transitioned into an operational leadership role, scaling the product team and bringing in new talent.

This partnership has been successful so far.

On "AI needs a new developer stack"

Source: https://www.fiddler.ai/blog/ai-needs-a-new-developer-stack

Let's look at companies like Google, Facebook, and Uber, who have built their own ML platforms. They have solved various challenges, from data labeling to feature engineering, training, model evaluation, deployment, and monitoring.

This requires a significant amount of effort and a team of engineers. For example, Facebook's FB Learner Team was a large team that focused on building end-to-end ML infrastructure, which significantly impacted various teams, such as the newsfeed team. The purpose of this article is to highlight that not every team has the resources to build such infrastructure.

Additionally, this new workflow requires a new developer stack. You need tools to manage and clean your training data. Fortunately, there are tools like Labelbox and Scale available for this purpose. You also need tools for feature engineering and tracking, such as feature stores. Numerous tools are available for training and evaluating models, including Sagemaker and open-source frameworks. Once trained, models need to be deployed, either in a batch-oriented manner using tools like Spark pipelines or as container-based endpoints. Tools like Seldon can assist with deployment.

Monitoring the performance of models is also crucial. This involves tracking different versions of models and analyzing their performance. This is where our expertise comes in. You need to connect various tools to create your MLOps workflow, which is similar to a LAMP stack but with additional tools.

Without these tools, building AI as the next generation of software is challenging, often referred to as software 2.0. The most important input for this process is data, not just code. It emphasizes the data-centric approach to creating, deploying, and maintaining AI systems. That's the main point of the article.

On The Evolution of MLOps

COVID-19, and the subsequent economic effects it triggered, is a major factor behind widespread MLmodel degradation across all kinds of models in the industry today.

Here is how MLOps Monitoring and Explainability can help. /thread
— Krishna Gade (@krishnagade) February 4, 2021

MLOps is truly underway, right? When we started writing the article, we would still say, "Hey, have you heard of FB Learner? Have you heard of Michelangelo? Have you heard of TFX? Do you have such a platform? If not, what are you going to do about it?"

People were still unaware of all these things. There was a lack of awareness, and the market was just about to start. But now, there's a lot of awareness, and teams are on the journey of MLOps. When they see things like ChatGPTs, people will think, "Oh, if you're not on it, we will miss out." Let's make AI happen within our enterprise. That's one trend.

The second trend that has also started, which we see, is the aspect of not only needing to do MLOps but also the whole concept of responsible AI. This includes bringing multiple parties together, looking for biases in data, having continuous model reporting and monitoring, and ensuring explainability.

The interesting thing is that Fiddler tries to span both of these worlds. On the one hand, there's the core of building MLOps systems that make it work for you. But at the same time, we want to help companies do it right and avoid mistakes. We see the opportunity in riding these two waves: MLOps and responsible AI.

On the Model Performance Management (MPM) framework

Machine Learning models are the new software artifacts getting operationalized at scale today but they are hard to debug!

If I were an ML Engineer on-call and got a PagerDuty alert that CLICKs are down on our AI-based recommendations product, how do I troubleshoot it? /thread
— Krishna Gade (@krishnagade) March 25, 2021

If you look at the big problem with machine learning models in terms of how MLOps works today, it's very linear. Most teams do not retrain their models; they just develop and deploy them and then forget about them. We have seen the effects of this during the pandemic, especially in the financial industry, where models can age for years.

For example, you might develop a great risk model and not touch it for more than a year. However, during the pandemic, everything changed. Supply chains changed, people lost jobs, unemployment rates changed, and interest rates changed. This disrupted every single model that people were developing.

People started realizing that there was no feedback loop. They didn't know when their models were decaying. One classic example was when Instacart or someone else wrote about how their inventory management models went from 90% accuracy to 60% accuracy.

These issues highlighted the lack of feedback. In control theory, there are open-loop systems and closed-loop systems. Open loop systems work as you input data and produce output without any feedback loop. On the other hand, closed-loop systems have a feedback loop that allows for adjustments.

Source: https://www.fiddler.ai/blog/introducing-ml-model-performance-management

For example, think of a car's cruise control. You can increase the gas pedal and see the feedback on the speedometer. You can create an automated cruise control that continuously monitors the speedometer and adjusts the speed accordingly.

This was the vision behind the whole model performance management framework. How can we build a closed-loop system for machine learning where we continuously receive feedback? This feedback includes how the model predicts, what features it considers, and how it performs against labeled data.

We also need to consider how the model affects business metrics associated with the model. For instance, if you're building a recommendation model, you need to analyze whether it actually improves likes, clicks, or additions to the cart. If you're building a credit risk model, does it reduce losses or improve the first paid installments?

We can provide actionable insights by looking at these metrics and quickly incorporating them into the feedback loop. We can identify if a model is drifting, if a data pipeline is broken, if a feature is malfunctioning, or if a feature's distribution has shifted and is impacting model performance.

Currently, closing the feedback loop still requires human intervention. Humans analyze the insights and make the necessary adjustments. However, our goal for the future is to automate this process as much as possible while still involving humans. We aim to provide more actionable insights to assist in fixing ML pipelines.

On The Four Pillars of Fiddler's MPM Platform

Source: https://www.fiddler.ai/ai-observability

The foundation of the Fiddler MPM product is model monitoring. We connect to the models and collect their outputs. The model can run on SageMaker, Databricks, a container endpoint, or a custom inference server. It can log the predictions and inputs directly to Fiddler or a queue that Fiddler can access. Alternatively, it can log to a centralized data lake like S3 or HDFS that Fiddler can connect to.

This is how we begin. We gather all the data and help you monitor the model's performance. We examine shifts in distribution. For example, suppose you have a risk model with attributes such as loan amount, customer's previous income, date, and debt. In that case, we monitor if there are changes in these features over time. Are loan amounts increasing or decreasing? Is the income level of loan applicants changing compared to the training data? Tracking these distribution changes and alerting you are the foundations of model monitoring.

Additionally, we provide root cause analytics to help you understand what is happening for specific data slices. If you notice a high number of false positives, we help you identify where and why they are occurring. This analysis helps uncover if the model is underperforming in certain regions or for specific user types.

We also offer explainability, allowing you to understand the reasons behind model decisions. You can investigate why a loan was denied for a particular user or why a specific data segment is underperforming. This understanding enables you to ask counterfactual questions and explore different scenarios. For example, you can simulate a higher salary or a lower loan request to see if the outcome would change.

Lastly, we address fairness. While building and monitoring models, it is important to assess their performance across various protected attributes such as gender, race, ethnicity, and location. By examining different segments of these attributes, you can identify disparities and take appropriate action.

These four pillars - model monitoring, root cause analytics, explainability, and fairness - form the basis of the Fiddler MPM product.

On Explainability and Fairness Research

The calculations of explainability drift metrics and fairness require considerable research as they are relatively new topics. We have a centralized data science team led by a chief scientist and chief officer who is a pioneer in this field. Krishnaram joined us a year ago. He worked as a tech lead for ethical AI even before ethical AI became a prominent concept. He has also built some very interesting technology at Amazon. So, he is leading the charge here.

In terms of explainability research, we have developed our own explanation algorithms inspired by Shapley's values. One of our algorithms received a Best Paper award in 2020. Regarding fairness, we have invested heavily in intersectional fairness.

Intersectional fairness refers to looking at fairness across multiple attributes simultaneously rather than focusing on a single attribute at a time. Even if a model appears fair when considering race alone, gender alone, or geography alone, combining these protected attributes and examining their intersections can reveal additional fairness issues. We have published a paper on intersectional fairness and the metrics required to measure it.

When it comes to monitoring, we have made significant investments in measuring drift in complex models. By examining distribution shifts, it is relatively simple to measure drift for models that use scalar features such as age, income, or transaction amount. However, measuring drift becomes more challenging when dealing with vector features or embeddings, such as race embeddings. We have developed novel techniques to compute drift in NLP and computer vision models, allowing us to create a single metric for monitoring. By computing drift on a vector of embeddings, we can alert you when changes occur and investigate further.

These are some of our data science team's ongoing research and development efforts.

On Monitoring Unstructured Data

I am really stoked about the biggest product launch we've made in the history of @fiddlerlabs. Today we announce monitoring for NLP and Computer Vision models and a brand new UX. Check us out! https://t.co/lOVoYq6v5r
— Krishna Gade (@krishnagade) July 28, 2022

Traditional MLOps is mainly focused on operationalizing traditional machine learning models such as logistic regression, decision trees, boosted trees, and random forests. These models analyze scalar variables to predict fraud, credit risk, churn, and lifetime value. These are the common use cases in MLOps.

However, there is an increasing trend in applying advanced models like conversational bots, image processing, and object identification within images. For example, advanced models like deep learning models are preferred when dealing with compliance documentation or customer sentiment analysis. Models based on transformers, such as BERT, have gained popularity in these cases. The recent emergence of GPT-3 has further pushed the boundaries of advanced models.

These models operate in a higher-dimensional space using embedding-based features. They take raw inputs and convert them into embeddings to make predictions. However, this poses challenges in terms of model explanation and monitoring. Explaining complex models becomes a necessity, and monitoring them becomes crucial.

Source: https://www.fiddler.ai/natural-language-processing-and-computer-vision-model-monitoring

While these models offer high accuracy, explaining them and monitoring drift in embeddings is not straightforward. Embeddings themselves are not interpretable features. To address this, we have developed explainability features to explain complex models. Additionally, we have introduced a clustering-based technique to monitor drift in embeddings. By clustering vectors in high-dimensional space and comparing the distribution changes between training data and production data clusters, we can identify shifts in distribution. This enables us to compute drift metrics and detect issues such as a sudden shift from cat images to dog images.

We also overlay example images to visualize the cluster shifts. This way, we can observe the change in cluster composition and understand the reasons behind the shift.

On Model Governance for The Modern Enterprise

Zillow has recently shut down its AI-enabled iBuyer program that overpaid thousands of houses in summer 2021, along with laying off 25% of its staff.

Are teams managing these risks w/ AI well?

THREAD: How to build a robust Model Risk Management (MRM) process in your company?
— Krishna Gade (@krishnagade) December 2, 2021

Model governance fits at the intersection of MLOps and responsible AI, right? It's a way to implement responsible AI within an organization, especially in regulated industries like financial services, healthcare, or recruiting.

Model governance is mandated by regulations, such as OCC regulations in the financial service industry (SR 11-7 regulations). It involves putting guardrails around models so multiple organizational stakeholders can observe their performance.

Source: https://www.fiddler.ai/blog/the-new-5-step-approach-to-model-governance-for-the-modern-enterprise

Regulated companies often have dedicated governance, risk management, or compliance teams whose job is to minimize risk for the organization. These teams report to the chief risk officer and require a report from the modeling team for every deployed model.

The report includes information about the model, the chosen algorithm, inputs, input characteristics, model outputs, and examples of predictions. The governance team reviews the report and provides feedback or approval before launching the model. Model governance aims to create visibility for non-technical teams and ensure compliance with regulatory policies.

The data scientist team should collaborate with the legal team in an HR company or follow OCC guidelines in a bank to address risk management perspectives. The modeler shares the report with a bank's model risk management team.

We propose building a model monitoring infrastructure and creating a user-friendly layer for compliance and legal stakeholders to review these reports. This makes sharing reports with external regulators easier during visits and ensures compliance with regulations.

Standardization and efficiency are key benefits of model governance. Without it, the modeling team spends months creating reports, leading to reduced model velocity. Implementing governance can improve velocity and reduce risks.

In the blog post, I discuss the importance of creating an inventory of models, validating models, understanding the model's purpose and important features, documenting approval and denial conversations, and having monitoring reports in one place. This five-step process can help build responsible AI or regulated AI in any industry.

Overall, model governance extends the core product we are building and makes it accessible for non-technical teams. This idea aligns with making MPM work for these teams as well.

On Hiring

A great team is critical for any startup to be successful. When you have a long-term mission - it is critical to hire missionaries and weed out mercenaries who are there just for title/pay, etc.

It is a continual process, you have to keep improving the team as you go. /6
— Krishna Gade (@krishnagade) January 20, 2022

I have been building teams for almost 10 years, and while building a team for a startup, there are some differences from big tech. Generally, when looking for candidates, I consider competence, attitude, and passion.

For competence, I assess their ability to do the job effectively. I also consider their attitude and whether they are easy to work with, both for me and the team. Additionally, I evaluate their passion for the field and the problems they will be working on.

However, when it comes to startups, hunger and belief in the mission become even more important. This can outweigh other factors. As an early-stage startup, attracting top talent in various roles may be challenging. As a small entity, it is difficult to convince highly qualified individuals to take on the risks.

In this context, it is crucial to find a balance. If you manage to attract such candidates, it is amazing. To do so, you need to focus on your mission, sell them on the vision, and find people who are genuinely excited and hungry for it. With the right drive, individuals can acquire the necessary abilities over time. Of course, they also need the right attitude to navigate the challenges of a startup journey.

We have made hiring mistakes in the past. When we focused too much on competency and neglected the other two attributes, it led to issues. Similarly, ignoring cultural fit caused problems, even if the person had exceptional abilities.

Despite these failures, it is important to continuously improve and learn from past mistakes. These three attributes—competence, attitude, and passion—are crucial to consider when hiring. In the startup context, hunger and the drive to make things happen are additional qualities that are highly valued. You have to actively make things happen every day, as they won't happen on their own. Continuous effort and determination are essential.

On Shaping The Culture

Finally, it is super important to get your company culture right. IMO, culture is not what you preach, it is not what you say at your all-hands, it is not the list of values on your website.

Culture is what your company does on a day-to-day basis. /13
— Krishna Gade (@krishnagade) January 20, 2022

We used to have culture sessions when we were tiny and could fit in a small office in Mountain View. During one of these sessions, we asked, "What is culture?" and received various answers.

One response was that culture is not what we preach or write on a wiki page or what we advertise on our website. It is the everyday actions and behaviors we exhibit towards each other. It encompasses who gets recognized, respected, and rewarded and which behaviors are encouraged or discouraged.

These small things are noticed by people and contribute to the company's culture. The culture can be serious and hardworking, fun and relaxed, or characterized by respect among team members. Ultimately, culture is shaped by the people you hire.

As a founder, it is important to be conscious of culture. Starting with a set of values as a foundation is a good starting point. However, culture evolves, and new elements need to be incorporated as the team grows. Some elements may need to be pruned or shaped.

Currently, our culture is described as extremely driven and passionate. We have gone through an interesting journey in the last four years, iterating on our product and learning what works for our customers. We have developed resilience and grown stronger as a team.

That summarizes what our team has learned about ourselves in the past four years.

On Finding Design Partners and Defining The Responsible AI Category

We learned that it is critical to have a set of early adopters, who are trying the product and giving feedback. Otherwise, it becomes hard for your team to prioritize new features, bug fixes, and scalability issues.

So, make sure you establish a customer council early on. /12
— Krishna Gade (@krishnagade) January 20, 2022

What sets us apart from other enterprise startups is that we never operated in stealth mode. From day one, we took on the task of evangelizing the category while simultaneously building the product. We didn't want to wait three years to unveil something if the category didn't yet exist. So, we took it upon ourselves to promote and educate others about the category.

We actively wrote blogs and had a dedicated team of data scientists who covered topics like explainable AI, responsible AI, and trusted AI. I also focused on thought leadership to spread awareness. These efforts were instrumental in getting our message out there.

Another critical aspect was hiring and finding our first product adopters. It's similar to hiring because it involves selling them on our vision and making them partners in that vision. As a founder, this was a significant part of my role. I used to frequently travel to New York, as our primary focus was the financial services industry. I would meet with people, discuss their challenges, and incorporate their ideas into our product. We would provide them with early releases, beta versions, and pre-beta releases for feedback. Many machine learning teams tried our product and offered valuable insights on what they wanted and what would work for them.

With @fiddlerlabs we are defining a category in ResponsibleAI and MLOps.

Creating a new category is hard! It requires a combination of non-obvious product insight, unique GTM, and an ability to weather skepticism.

You've to keep evangelizing and iterating with patience. /7
— Krishna Gade (@krishnagade) January 20, 2022

Evangelizing, building thought leadership, and finding product adopters are all part of selling your vision. During the early days, it's not about getting money but getting time from potential partners, which is equally important. They become design partners and help shape the product to meet their needs.

Having previously worked at big tech companies, some of my colleagues went on to lead machine learning teams or worked as ML engineers or data scientists in other companies. This network proved helpful as they could introduce me to other ML teams or companies. Additionally, having good investors also helped, as they could introduce us to their portfolio companies that were building machine learning systems and applications, as well as other companies in their network. Our extensive network gave us a significant advantage.

On Fundraising

We got really lucky to get a great set of investors in @lightspeed, @lux_capital, and @insight, It all started with our chance meeting with @semil.

Getting @semil was game-changing for us because not only does he back his companies, he actually hustles for them #Respect /5
— Krishna Gade (@krishnagade) January 20, 2022

When starting your startup, your early-stage investors must have a strong belief in you as an individual or a group of founders, as well as in your vision.

Early-stage investment is primarily about people. You need to establish a strong connection with your early-stage investors, and it should be a mutual fit. As you build your company and develop your product, you will require investors who can open doors for you, assist in hiring the rest of the team, and provide guidance on running and managing the company.

At each funding round, investors continue to invest in you as a founder, and the people factor remains constant. However, the value they bring to the table and your expectations from them will evolve as the company grows. To raise funds effectively, you must first socialize yourself and your idea with a wide range of investors. Identify those who are genuinely excited about your vision. It is similar to the hiring process, where you must sell your vision and generate enthusiasm among potential investors.

Remember, raising funds is a partnership. While investors provide capital, they also invest their time, emotions, and feelings into the startup. They want the startup to succeed, as their success is closely tied to the success of the startup. Therefore, it is crucial to establish a genuine partnership with your investors.

Show Notes

(02:20) Krishna described his academic experience getting an MS in Computer Science from the University of Minnesota - where he developed efficient tools for text document clustering and pattern discovery.
(05:36) Krishna recalled his 5.5 years at Microsoft working on Bing's search engine.
(08:32) Krishna talked about the challenges of competing against Google Search.
(10:22) Krishna shared the high-level technical and operational challenges encountered during the development and scaling phase of Twitter Search.
(14:55) Krishna revealed vital lessons from building critical data infrastructure at Twitter.
(17:54) Krishna touched on his time at Pinterest as the head of data engineering - leading a team working on all things data from analytics, experimentation, logging, and infrastructure.
(20:05) Krishna reviewed the design and implementation of real-time analytics, ETL-as-a-Service, and an A/B testing platform at Pinterest.
(24:40) Krishna unpacked the major ML model performance issues while running Facebook's feed ranking platform.
(28:18) Krishna distilled lessons learned about algorithmic governance from Facebook.
(31:38) Krishna provided leadership lessons from building teams that create scalable platforms and delightful consumer products on Twitter, Pinterest, and Facebook.
(33:19) Krishna shared the founding story of Fiddler AI, whose mission is to build trust into AI.
(37:56) Krishna unpacked the key challenges and tools in his 2019 article "AI needs a new developer stack."
(40:49) Krishna discussed the evolution of MLOps over the past 4 years.
(42:48) Krishna explained the benefits of using the Model Performance Management (MPM) framework to address enterprise MLOps challenges.
(47:01) Krishna gave a brief overview of capabilities within Fiddler's MPM platform, such as model monitoring, explainable AI, analytics, and fairness.
(50:28) Krishna highlighted research efforts inside Fiddler concerning explainability, drift metric calculation, and fairness.
(53:17) Krishna discussed the challenges with monitoring for NLP and Computer Vision models.
(57:18) Krishna zoomed in on Fiddler's approach to model governance for the modern enterprise.
(01:02:24) Krishna distilled valuable lessons learned to attract the right people who are excited about Fiddler's mission and aligned with Fiddler's culture.
(01:06:08) Krishna reflected on the evolution of Fiddler's company culture.
(01:09:19) Krishna shared the challenges of finding the early design partners and defining a new category of Responsible AI.
(01:12:23) Krishna gave fundraising advice to founders who are seeking the right investors for their startups.
(01:14:45) Closing segment.

Krishna's Contact Info

Fiddler's Resources

Website | LinkedIn | Twitter | YouTube
About | Customers | Careers
AI Observability | Model Monitoring | Explainable AI | Fairness | Analytics
Blog | Docs | Resources

Mentioned Content

People

Goku Mohamandas (Made With ML and Anyscale)
Krishnaram Kenthapadi (Chief AI Officer & Chief Scientist at Fiddler)

Books

"The Hard Thing About Hard Things" (Ben Horowitz)
"The Five Dysfunctions of A Team" (Patrick Lencioni)

Notes

My conversation with Krishna was recorded more than a year ago. Since then, I'd recommend checking out these Fiddler's resources:

Strategic investments in Fiddler by Alteryx Ventures, Mozilla Ventures, Dentsu Ventures, and Scale Asia Ventures.
Fiddler introduced an end-to-end workflow for robust Generative AI back in May 2023.
Krishna's thought leadership on LLMOps and the missing link in Generative AI.

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Related Episodes