Datacast Episode 55: Making Apache Spark Developer-Friendly and Cost-Effective with Jean-Yves Stephan

Datacast Episode 55: Making Apache Spark Developer-Friendly and Cost-Effective with Jean-Yves Stephan

Jean-Yves (or "J-Y") Stephan is the CEO & Co-Founder of Data Mechanics, a Y-Combinator-backed startup building a data engineering platform that makes Apache Spark more developer-friendly and more cost-effective. Before Data Mechanics, he was a software engineer at Databricks, the unified analytics platform created by Apache Spark's founders. JY did his undergraduate studies in Computer Science & Applied Math at Ecole Polytechnique (Paris, France) before pursuing a Masters at Stanford in Management Science & Engineering.

Datacast Episode 54: Information Retrieval Research, Data Science For Space Missions, and Open-Source Software with Chris Mattmann

Datacast Episode 54: Information Retrieval Research, Data Science For Space Missions, and Open-Source Software with Chris Mattmann

Chris Mattmann is the Chief Technology and Innovation Officer at NASA JPL. He is also JPL's first Principal Scientist in the area of Data Science. He has over 19 years of experience at JPL and has conceived, realized, and delivered the architecture for the next generation of reusable science data processing systems for NASA's space and earth science missions.

He contributes to open source and was a former Director at the Apache Software Foundation (2013-18).

Finally, he is the Director of the Information Retrieval & Data Science (IRDS) group at USC and Adjunct Associate Professor.

Datacast Episode 53: Algorithms and Data Structures In Action with Marcello LaRocca

Datacast Episode 53: Algorithms and Data Structures In Action with Marcello LaRocca

Marcello La Rocca is a research scientist and a full-stack engineer. He works as a consultant, creating large-scale web applications and machine learning infrastructure. He has gained invaluable experience at Twitter, Microsoft, and Apple - working on applied research in academia and industry. His work and interests focus on graphs, optimization algorithms, genetic algorithms, machine learning, and quantum computing.

Recommendation System Series Part 8: The 14 Properties To Take Into Account When Evaluating Real-World Recommendation Systems

Recommendation System Series Part 8: The 14 Properties To Take Into Account When Evaluating Real-World Recommendation Systems

Various properties are commonly considered when choosing the recommendation approach, whether for offline or online scenarios. These properties have trade-offs, so it is critical to understand and evaluate their effects on the overall performance and the user experience. This blog post is my attempt to summarize these properties succinctly.

Datacast Episode 52: Graph Databases In Action with Dave Bechberger

Datacast Episode 52: Graph Databases In Action with Dave Bechberger

Dave Bechberger is known for his expertise in distributed data architecture and being a Graph Database SME.  He is known for his pragmatic approach to data architectures and for implementing large-scale distributed data architectures for big data analysis and data science workflows using various SQL and NoSQL data technologies. He is the author of "Graph Database in Action" by Manning publications and has spoken both nationally and internationally at conferences on subjects related to distributed data and graph databases.


Dave spent 20+ years developing, managing, and consulting on software projects and is currently a member of the Amazon Neptune service team. He works with both customers and engineering teams to simplify and speed the adoption of graph technologies.

Datacast Episode 51: Research and Tooling for Computer Vision Systems with Jason Corso

Datacast Episode 51: Research and Tooling for Computer Vision Systems with Jason Corso

Dr. Jason Corso is the new director of the Stevens Institute for AI. He is also the co-founder and CEO of Voxel51, an AI software company creating development tools for improving the performance of computer vision and machine learning systems. Previously, he was a professor of electrical engineering and computer science at the University of Michigan. A veteran in the field of computer vision, Jason has dedicated over 20 years to academic research and has authored nearly 150 academic papers and hundreds of thousands of lines of open-source code on video understanding, robotics, and data science. He received his Ph.D. and MSE degrees from Johns Hopkins University and his bachelor’s degree from Loyola University Maryland, all in computer science.

2020 Annual Review: The Year of Resilience

2020 Annual Review: The Year of Resilience

The review is a deeply personal report, letting me see myself for who I am and think about the type of person I want to become. I’ll start with the highlights, reflect on 2020 goals, set 2021 goals, celebrate milestones, review areas of improvement, and conclude with a list of open questions.

Datacast Episode 50: Reducing Data Downtime with Barr Moses

Datacast Episode 50: Reducing Data Downtime with Barr Moses

Barr Moses is the CEO & co-founder of Monte Carlo, a data reliability company committed to accelerating the world’s data adoption by reducing Data Downtime. Monte Carlo is backed by Accel, GGV, and other top Silicon Valley investors, including the former Chief Data Scientist of the U.S., DJ Patil. Previously, Barr was VP Customer Operations at customer success company Gainsight, where she helped scale the company 10x in revenue and, among other functions, built the data/analytics team. Prior to that, Barr was a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit and graduated from Stanford University with a B.Sc. in Mathematical and Computational Science.

What I Learned From Attending Toronto Machine Learning Summit 2020

What I Learned From Attending Toronto Machine Learning Summit 2020

Last month, I had the opportunity to attend the Toronto Machine Learning Summit 2020, organized by the great people at the Toronto Machine Learning Society. I previously attended their MLOps event in the summer, which I also have written an in-depth recap here.

The summit aims to promote and encourage the adoption of successful machine learning initiatives within Canada and abroad. There was a variety of thought-provoking content tailored towards business leaders, practitioners, and researchers. In this long-form post, I would like to dissect content from the talks that I found most useful from attending the conference.