Jason Risch is an investor on the enterprise team at Greylock - investing in security, AI/ML, data, infrastructure, and developer tools. Before joining Greylock, he incubated ML companies at AI Fund and was a management consultant at McKinsey. Jason is a Bay Area native, graduated from Stanford, and when not working, can be found reading, hiking, playing Age of Empires, and cheering on Stanford Football.
Datacast Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar
Vinoth Chandar is the creator and PMC chair of the Apache Hudi project, a seasoned distributed systems/database engineer, and a dedicated entrepreneur. He has deep experience with databases, distributed systems, and data systems at the planet scale, strengthened through his work at Oracle, Linkedin, Uber, and Confluent.
During his time at Uber, he created Hudi, which pioneered transactional data lakes as we know them today, to solve unique speed and scale needs for Uber’s massive data platform. Most recently, Vinoth founded Onehouse - a cloud-native managed lakehouse to make data lakes easier, faster, and cheaper.
What I Learned From Tecton's apply() 2022 Conference
Back in May, I attended apply(), Tecton’s second annual virtual event for data and ML teams to discuss the practical data engineering challenges faced when building ML for the real world. There were talks on best practice development patterns, tools of choice, and emerging architectures to successfully build and manage production ML applications.
This long-form article dissects content from 14 sessions and lightning talks that I found most useful from attending apply(). These talks cover 3 major areas: industry trends, production use cases, and open-source libraries. Let’s dive in!
What I Learned From DataOps Unleashed 2022
Earlier this month, I attended the second iteration of DataOps Unleashed, a great event that examines the emergence of DataOps, CloudOps, AIOps, and other professionals coming together to share the latest trends and best practices for running, managing, and monitoring data pipelines and data-intensive analytics workloads.
In this long-form blog recap, I will dissect content from the session talks that I found most useful from attending the summit. These talks are from DataOps professionals at leading organizations detailing how they establish data predictability, increase reliability, and reduce costs with their data pipelines. If interested, you should also check out my recap of DataOps Unleashed 2021 last year.