Data Warehouse

Datacast Episode 119: Experimentation Culture, Immutable Data Warehouse, The Data Collaboration Problem, and The Rise of Data Contracts with Chad Sanderson

Datacast Episode 119: Experimentation Culture, Immutable Data Warehouse, The Data Collaboration Problem, and The Rise of Data Contracts with Chad Sanderson

Chad Sanderson was the Product Lead for Convoy's Data Platform team, which includes the data warehouse, streaming, BI & visualization, experimentation, machine learning, and data discovery.

Previously he worked on Microsoft's AI Platform team and led Data initiatives at SEPHORA and Subway. He has built everything from feature stores, experimentation platforms, metrics layers, streaming platforms, analytics tools, data discovery systems, and workflow development platforms.

His love of the data space has also allowed him to implement open-source and SaaS products (early and late-stage) and build cutting-edge technology from the ground up.

Datacast Episode 110: Wisdom in Building Data Infrastructure, Lessons from Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini

Datacast Episode 110: Wisdom in Building Data Infrastructure, Lessons from Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini

Chris Riccomini is an engineer, author, investor, and advisor. He has worked on infrastructure as an engineer and manager for about 15 years at PayPal, LinkedIn, and WePay. He was involved in open source as the original author of Apache Samza and an early contributor to Apache Airflow. He has also written a book with Dmitriy Ryaboy called The Missing README, a guide for software engineers. Lately, he has been investing in startups in the data space.

Datacast Episode 90: Operational Analytics, Reverse ETL, and Finding Product-Market Fit with Kashish Gupta

Datacast Episode 90: Operational Analytics, Reverse ETL, and Finding Product-Market Fit with Kashish Gupta

Kashish Gupta is the founder and co-CEO of Hightouch, a data startup based out of San Francisco. He grew up in Atlanta, loves playing racket sports, and always wanted to be an inventor when he grew up. He studied Machine Learning in college and had a short stint at a VC firm called Bessemer Venture Partners, and ever since graduating has been working on Startups. He and his co-founders are on their 5th business idea and have finally found a product-market fit.

What I Learned From The Modern Data Stack Conference 2021

What I Learned From The Modern Data Stack Conference 2021

Back in September 2021, I attended the second annual Modern Data Stack Conference, Fivetran’s community-focused event that brings together hundreds of data analysts, data engineers, and data leaders to share the impact and experiences of next-generation analytics. The presenters shared the transformations they experienced with their analytics teams, the new insights and tooling they enabled, and the best practices they employ to drive insights across their organizations.

In this long-form blog recap, I will dissect content from 14 sessions that I found most useful from the conference. These talks are broken down into 4 categories tailored to 4 personas: data engineers, data analysts, product managers, and data team leads. Let’s dive in!

Datacast Episode 72: Folding Data with Gleb Mezhanskiy

Datacast Episode 72: Folding Data with Gleb Mezhanskiy

Gleb Mezhanskiy is the CEO & Co-founder of Datafold -  a data observability platform that helps companies unlock growth through more effective and reliable use of their analytical data. As a founding member of Data teams at Autodesk and Lyft and the Head of Product at Phantom Auto, Gleb has built some of the world's largest and most sophisticated data platforms and has developed tooling to improve productivity and data quality in organizations with hundreds of data users.

Datacast Episode 62: Leading Organizations Through Analytics Transformations with Gordon Wong

Datacast Episode 62: Leading Organizations Through Analytics Transformations with Gordon Wong

As a data modeling fanatic, data warehouse architect, multi-hypergrowth startup veteran, and team builder, Gordon has built his career on helping people get their business questions. Over time, he's switched his focus from pure technology to complete solutions where people, process, and technology all play a role. At Fitbit, he established the data warehousing team and, as an early customer of Snowflake, used it to fuel petabyte-scale analytics. Later on, at both ezCater and Hubspot, he rebuilt the data warehousing teams to focus on enabling analysts, not loading more data. A constant focus on the customer and their problems has led him to realize that empathy is the most important trait a leader can have.

What I Learned From Attending #SparkAISummit 2020

What I Learned From Attending #SparkAISummit 2020

One of the best virtual conferences that I attended over the summer is Spark + AI Summit 2020, which delivers a one-stop-shop for developers, data scientists, and tech executives seeking to apply the best data and AI tools to build innovative products. I learned a ton of practical knowledge: new developments in Apache Spark, Delta Lake, and MLflow; best practices to manage the ML lifecycle, tips for building reliable data pipelines at scale; latest advancements in popular frameworks; and real-world use cases for AI.

An Introduction to Big Data: Data Integration

An Introduction to Big Data: Data Integration

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.