Datacast Episode 37: Machine Learning In Production with Luigi Patruno

The 37th episode of Datacast is my conversation with Luigi Patruno — the Director of Data Science at 2U. Give it a listen to hear about his educational background in Applied Math and Computer Science; his experience working across the ML stack in data engineering, ML engineering, and data science; his wide-ranging articles and newsletter that cover production-ready ML systems, his advice on hiring and running successful ML projects, and much more.

Luigi Patruno is a Data Scientist and the Founder of MLinProduction.com. He's currently the Director of Data Science at 2U, where he leads a team of data scientists and ML engineers in developing machine learning models and infrastructure to predict student success outcomes.

Luigi Patruno is a Data Scientist and the Founder of MLinProduction.com. He’s currently the Director of Data Science at 2U, where he leads a team of data scientists and ML engineers in developing machine learning models and infrastructure to predict student success outcomes. Luigi founded MLinProduction.com to educate data scientists, ML engineers, and ML product managers about best practices for running machine learning systems in production.

As a consultant for Fortune 500s and start-ups, Luigi helps companies utilize data science to create competitive advantages. He has taught graduate-level courses in Statistics and Big Data Engineering and holds a Masters in Computer Science and a BS in Mathematics.

Show Notes

  • (2:19) Luigi got his Bachelor’s in Mathematics and Master’s in Computer Science from Fordham University, with a break working as a Data Analyst in between.

  • (5:41) Luigi worked as a Research Engineer at Fordham’s Wireless Sensor Data Mining Lab for a year during his Master’s program and got exposed to Machine Learning.

  • (9:13) Luigi’s first role out of graduate school is a Data Engineering position at Namely, a Human Resources platform for thousands of mid-sized companies.

  • (14:33) Luigi then worked as a Machine Learning Engineer at CTRL-Labs — a startup (acquired by Facebook) pioneering the development of non-invasive neural interfaces that reimagine how humans and machines collaborate.

  • (20:45) Luigi discusses the skills he picked up during his transition from Data Engineering to Machine Learning Engineering, such as data analysis, data visualization, dimensionality reduction, and domain expertise.

  • (25:38) Luigi went over his time teaching graduate courses in Applied Statistics & Probability and Big Data Programming at Fordham’s Department of Computer Science.

  • (28:37) Luigi talked about his next role working as a Data Scientist at 2U — an edTech SaaS platform providing schools with the comprehensive operating infrastructure they need to attract, enroll, educate, support, and graduate students globally.

  • (31:12) Luigi emphasized the importance of being good at data science and picking up skills from other functional domains for anyone looking into management roles.

  • (33:47) Luigi shared brief thoughts on the role of ed-tech in the current environment with remote education.

  • (35:47) Luigi unpacked his blog post called How I Hire Data Scientists that shares advice for both the hiring managers and the job applicants.

  • (42:05) Luigi shared his anecdotal journey of starting ML In Production — which provides content on the best practices of doing machine learning in production. Check out this article for more detail!

  • (47:16) Luigi discussed the nuts and bolts of setting up the weekly newsletter for his website.

  • (51:21) Luigi unpacked the 4-part series “Docker for Machine Learning” that discusses the benefits of using Docker with machine learninghow to build custom Docker imageshow to perform batch inference using Docker containers, and how to perform online inference using Docker and Flask REST API.

  • (54:26) Luigi’s next post, “Batch Inference vs. Online Inference,” discusses the differences between using batch inference or online inference for model serving.

  • (56:51) Luigi’s next post, “Storing Metadata from ML Experiments,” reveals the importance of storing metadata during the machine learning process as well as the types of metadata to capture.

  • (01:00:49) Luigi’s following post “How Data Leakage Impacts ML Models” goes over the issues of data leakage, which occurs when data used at training time is unavailable at inference time.

  • (01:04:14) Luigi unpacked his 6-part series that first introduces Kubernetes and then goes deeper into its components, including PodsJobsCronJobsDeployment, and Services.

  • (01:07:06) Luigi reflected on his talk “Productionizing ML Models at scale with Kubernetes” at the TWIML conference last year.

  • (01:10:50) Luigi dug into his popular post “The Ultimate Guide to Model Retraining,” which covers the problem of model drift as well as the necessary steps to retrain models already in production.

  • (01:14:36) Luigi laid out the benefits of using AWS SageMaker for model deployment. Check out his concise description of SageMaker’s architecture as well as his video tutorial on how to train scikit-learn models on SageMaker.

  • (01:17:50) Luigi unpacked his multi-part series on model deployment. So far, he has covered deployment in the machine learning contextsoftware interfacesbatch inferenceonline inferencemodel registriestest-driven development, and A/B testing.

  • (01:22:12) Luigi encouraged every software engineer to learn about running ML systems in production, given the gradual shift to Software 2.0, as indicated in his post “Machine Learning is Forcing Software Development to Evolve.”

  • (01:26:48) Luigi reveals what differentiates successful industry ML projects from unsuccessful ones, based on his interview series with other ML practitioners. Hint: (1) don’t focus on the hype, instead focus on the business outcomes + (2) start small.

  • (01:30:10) Luigi distinguished the skills required for the three roles: data engineers, data scientists, and machine learning engineers.

  • (01:33:46) Luigi shared his opinions on the data science community in New York City.

  • (01:35:41) Closing segment.

His Contact Info

His Recommended Resources

A New Course From Luigi

Luigi just launched his first online course, Build, Deploy, and Monitor Machine Learning Models with Amazon SageMaker! I had a look at the course content, and I’m convinced that the course will be super valuable to any ML engineer or data scientist who wants to level up and learn how to productionize their machine learning models.

You can take the course on your own, but Luigi is also teaming up with TWiML to offer a version with virtual Study Group sessions for people who want a more interactive experience. Right now, Luigi is offering an early bird discount on the course until August 1st!

I know the course will be precious for a lot of you within my community, so Luigi created a coupon code DATACAST to save an additional 10% off the course!

Head over to the Teachable course page to learn more about AmazonSageMaker and take advantage of the discount!