The 43rd episode of Datacast is my conversation with Francesca Lazzeri — a Lead Senior Machine Learning Scientist at Microsoft. Give it a listen to hear about her educational background in economics and operations management, her post-doc research in technological innovation at HBS, her ongoing career at Microsoft working on various features of Azure AI, her advice for students looking to get into the field, and much more.
Listen to the show on (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) Stitcher, (5) Overcast, and (6) Breaker!
Key Takeaways
Below are highlights from my conversation with Francesca:
On Studying In Italy and Doing a Post-Doc at HBS
Both of my Master’s and Ph.D. degrees back in Italy are in Economics. During my Ph.D., I explored the interaction between technology innovation and economic development. I conducted data-driven research with a strong focus on econometrics and statistics, using many languages such as SPSS, SQL, R, and Python.
I didn’t complete my study path in Italy, as I was lucky to receive a post-doc fellowship at Harvard Business School in the US to study Technology Management. This was a similar type of research in my graduate study in Italy.
One of the requirements for my Ph.D. at Sant’Anna University was to conduct a part of my thesis with a professor or a research unit outside Italy. Harvard is one of the partner schools within Sant’Anna’s university network. I was especially interested in the Technology Operations unit at Harvard, sent in my application, and got accepted into the program.
In general, the three areas that connected my academic work were data, economics, and statistics.
The American educational system focuses on practical knowledge, while the Italian counterpart has a stronger emphasis on the theory. Having exposure to practical experience helped me to transition into the industry easier later on.
On Joining Microsoft
Microsoft has an office in Cambridge, MA, which is very close to Harvard. I was fascinated by how Microsoft approached data science and made it accessible to everyone, not just the academic elitists. Microsoft also uses data science to improve processes and businesses, which closely aligns with my interests.
My first year at Microsoft included projects with external customers in the energy and finance sectors. These sectors have been using data for a long time, much longer before data science became popular. I learned a lot about working with raw customer data and understanding how the cloud can improve their processes.
On Working With Two Types of Customer at Microsoft
The first type of customer is very knowledgeable about the data and the machine learning models but lacks an understanding of cloud platforms.
The second type of customer knows how to use different cloud technologies, but have little knowledge of leveraging internal data to answer their business questions.
As a data scientist, you must always keep an open mindset. If you work with the first customer category, you can learn about new types of algorithms from the customers while playing the role of a cloud expert. If you work with the second customer category, you want to run workshops or webinars and show them how to use machine learning to tackle simple business problems while using their cloud technologies to deploy machine learning applications.
On The Healthy Data Science Organization Framework
The first principle is to understand the business and decision-making process. Having this business-oriented mindset is very important as a data scientist because every decision you will make has to be related to a business problem.
The second principle is to establish performance metrics. Every time you translate business questions to data science questions, you need to understand and define both the business metrics and the machine learning metrics to affirm that your solutions will be good enough for production.
The third principle is to think about the end-to-end pipelines. If you want to grow your career, you need a clear overview of different steps of the ML architecture, including data ingestion, data collection, feature engineering, model evaluation, model retraining, etc.
The fourth principle is to build your toolbox of data science tricks. This is something that you can acquire after a few years of experience: templates, algorithms, pre-trained models, etc. that you are comfortable with.
The fifth principle is to unify your data science organization’s vision. Make the data science unit critical to your business process, equivalent to the finance or marketing units.
The sixth principle is to keep humans in the loop. Humans are needed to solve challenging issues in data and model quality. Also, experts from other business units in your company should be involved to guide your data science journey.
On The Challenges of Building End-To-End ML Applications
The data preparation process is the most challenging step, which includes data cleaning and feature engineering.
Other important steps include the ML deployment and operationalization phases. This enables the models to be consumed by other people in your organization and your external customers. Your models, therefore, need to be retrained over time.
Nowadays, there are MLOps (DevOps for Machine Learning) tools that help you with the deployment and automation throughout the ML development lifecycle.
There are also different cloud-based tools and packages that allow you to create model instances easily via simple APIs.
On AutoML
Automated Machine Learning is a popular concept that selects the best model and tunes hyper-parameters for you based on the data and tasks at hand.
At the moment, Azure supports Automated ML for classification, regression, and forecasting tasks. You can configure this capability based on criteria such as evaluation metrics, number of iterations, GPU/CPU options, and more.
This shows the importance of ML research and its applicability to real-world products. It aligns well with Microsoft’s mission to democratize data science.
On Model Interpretability and Model Fairness
Trust and responsibility must be at the core of your ML solutions across different data-, model-, platform-, and process-levels. At Microsoft, there are other values in the context of Responsible ML: understanding models, protecting user data, and controlling the end-to-end process.
Interpretability helps you unpack your ML models and understand their behaviors (why individual decisions are made). InterpretML is Microsoft’s open-source package to interpret and explain your models better at both training and inference time.
Fairness helps you assess and mitigate the biases in your ML models. Fairlearn is Microsoft’s open-source package to tackle the harm of allocation and the harm of service quality.
On Advice for Students
There has been an increasing amount of undergraduate and graduate programs that prepare students for a career in data science. The latest generation of students, therefore, already possesses the necessary skill set to get into the field.
Students, most of the time, focus on the theory and modeling side but lack knowledge on the tooling and technology that allows them to build end-to-end solutions.
They should spend more time exploring different cloud platforms and frameworks.
Show Notes
(2:37) Francesca discussed her educational background in Italy, studying Economics and Institutional Studies at LUISS Guido Carli University for her Master’s and then Economics and Technology Innovation at Sant’Anna University for her Ph.D. She also mentioned her transition to studying in the US at Harvard Business School.
(7:43) Francesca shared the anecdote behind going to HBS to pursue a Postdoc Research Fellowship in Economics. She also revealed the differences in the educational approaches between Italy and the United States.
(15:15) During her Post-doc, Francesca worked on multiple patent data-driven projects to investigate and measure the impact of external knowledge networks on companies’ competitiveness and innovation. She discussed a specific project that analyzed biotech innovation in Boston, San Diego, and San Francisco clusters using social media and citation data.
(24:26) Francesca talked about her decision to join Microsoft as a data scientist in its Cloud and Enterprise division back in 2014, where she first worked on projects for clients from the energy and finance sectors.
(30:00) Francesca discussed the two types of customers who seek Microsoft’s cloud solutions to solve their data problems and explained the learning curves she went through while interacting with them.
(36:11) Francesca unpacked the Healthy Data Science Organization Framework — which is a portfolio of methodologies, technologies, resources that will assist organizations in becoming more data-driven (Read her InfoQ article “The Data Science Mindset: 6 Principles to Build Healthy Data-Driven Organizations”).
(45:31) Francesca shared the challenges of building end-to-end machine learning applications that she has observed from Microsoft Azure AI’s clients.
(49:56) Francesca walked through a typical day in her current leadership role at Microsoft’s Cloud AI Advocates team.
(53:44) Francesca discussed the different components in a typical Azure deployment workflow (Read her post “Azure Machine Learning Deployment Workflow”).
(58:44) Francesca explained Automated Machine Learning, a breakthrough from Microsoft Research division that is essentially a recommender system for machine learning pipelines.
(01:03:50) Francesca went over model interpretability features within Azure AI (as part of the InterpretML package) and touched on Microsoft’s Responsible AI principles.
(01:08:01) Francesca explained the differences between model fairness and model interpretability at both the training time and inference time (Check out the Fairlearn package).
(01:12:11) Francesca is currently writing a book with Wiley called “Machine Learning for Time Series Forecasting with Python.”
(01:14:39) Francesca shared her advice for undergraduate students looking to get into the field, judging from her experience being a mentor for Ph.D. and Post-doc students at institutions such as Harvard, MIT, and Columbia.
(01:17:27) Francesca reasoned how her educational backgrounds in economics and operations management contribute to her success in a data science career
(01:20:09) Closing segment.
Her Contact Info
Her Recommended Resources
People To Follow
Book To Read
An Introduction to Probability Theory and Its Applications (by William Feller)
A Developer’s Introduction to Data Science
Azure Machine Learning
Responsible Machine Learning
Automated Machine Learning