Open Source

Datacast Episode 126: Vector Search Engine, Building An Open-Source Business, and Digital Technology Through The Lens of Language with Bob Van Luijt

Datacast Episode 126: Vector Search Engine, Building An Open-Source Business, and Digital Technology Through The Lens of Language with Bob Van Luijt

Bob Van Luijt is the CEO and co-founder of Weaviate, the business created around the open-source vector database Weaviate. Besides Weaviate, Bob frequently speaks on open-source, digital technology, software business, and business philosophy. He has spoken at 100s of events on the topics mentioned above all over the world, including a TEDx talk.

Datacast Episode 124: The Open-Source Cloud Playbook, The Modular Future of Data and AI Infrastructure, and Meta-Learning as a VC with Casber Wang

Datacast Episode 124: The Open-Source Cloud Playbook, The Modular Future of Data and AI Infrastructure, and Meta-Learning as a VC with Casber Wang

Casber Wang is Partner at Sapphire Ventures. He focuses primarily on security, enterprise infrastructure, and data analytics.

He is on the boards of Huntress, JumpCloud, StarTree, Tetrate, Upytcs, and Zesty. For his work, Insider listed Casber as an Enterprise VC Rising Star Investor and as an emerging investor charting the industry’s future on the 2022 EVC List.

Prior to Sapphire, he was part of the technology investment banking group at Bank of America Merrill Lynch, where he worked on a number of high-profile IPO and M&A transactions. He also spent time at Wish, a leading mobile commerce platform in North America and Europe.

Datacast Episode 121: High-Performance Processing Engine, Modern Data Streaming, and Propelling Minority in Tech with Alex Gallego

Datacast Episode 121: High-Performance Processing Engine, Modern Data Streaming, and Propelling Minority in Tech with Alex Gallego

Alexander Gallego is the founder and CEO of Redpanda Data, a high-performance, Apache Kafka-compatible data streaming platform for mission-critical workloads. He has spent his career immersed in deeply technical environments and is passionate about finding and building solutions to the challenges of modern data streaming.

Before Redpanda, Alex was a principal engineer at Akamai and the co-founder and CTO of Concord.io, a high-performance stream-processing engine acquired by Akamai in 2016. He has also engineered software at Factset Research Systems, Forex Capital Markets, and Yieldmo; and holds a bachelor’s degree in computer science and cryptography from NYU.

Datacast Episode 118: Overcoming Hardships, Confident Learning, Dataset Improvement, and The Ph.D. Rapper with Curtis Northcutt

Datacast Episode 118: Overcoming Hardships, Confident Learning, Dataset Improvement, and The Ph.D. Rapper with Curtis Northcutt

Curtis Northcutt is an American computer scientist and entrepreneur focusing on AI to empower people. He is the CEO and Co-Founder of Cleanlab, building next-generation data-centric AI and open-source technologies that enable AI to work with real-world, messy data.

He completed his Ph.D. at MIT, where he invented confident learning to automatically find label issues in any dataset. Curtis received the MIT thesis award, NSF Fellowship, and Goldwater Scholarship for his work. Before Cleanlab, he worked in AI research teams at Google, Oculus, Amazon, Facebook, Microsoft, and NASA.

Datacast Episode 117: Vector Databases, The Embeddings Revolution, and Working in China with Frank Liu

Datacast Episode 117: Vector Databases, The Embeddings Revolution, and Working in China with Frank Liu

Frank Liu is the Director of Operations at Zilliz with nearly a decade of industry experience in machine learning and hardware engineering. Prior to joining Zilliz, Frank co-founded an IoT startup based in Shanghai and worked as an ML Software Engineer at Yahoo in San Francisco. He presents at major industry events such as Open Source Summit and writes tech content for leading publications such as Towards Data Science and DZone. Frank holds MS and BS degrees in Electrical Engineering from Stanford University.

Datacast Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar

Datacast Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar

Vinoth Chandar is the creator and PMC chair of the Apache Hudi project, a seasoned distributed systems/database engineer, and a dedicated entrepreneur. He has deep experience with databases, distributed systems, and data systems at the planet scale, strengthened through his work at Oracle, Linkedin, Uber, and Confluent.

During his time at Uber, he created Hudi, which pioneered transactional data lakes as we know them today, to solve unique speed and scale needs for Uber’s massive data platform. Most recently, Vinoth founded Onehouse - a cloud-native managed lakehouse to make data lakes easier, faster, and cheaper.

Datacast Episode 112: Distributed Systems Research, The Philosophy of Computational Complexity, and Modern Streaming Database with Arjun Narayan

Datacast Episode 112: Distributed Systems Research, The Philosophy of Computational Complexity, and Modern Streaming Database with Arjun Narayan

Arjun Narayan is the co-founder and CEO of Materialize. Materialize is a streaming database for real-time applications and analytics, built on top of a next-generation stream processor – Timely Dataflow. He was previously an engineer at Cockroach Labs and held a Ph.D. in Computer Science from the University of Pennsylvania.

Datacast Episode 111: Astrophysics, Visualization Recommendation, and Scalable Data Science with Doris Lee

Datacast Episode 111: Astrophysics, Visualization Recommendation, and Scalable Data Science with Doris Lee

Doris Lee is the co-founder and CEO of Ponder, a startup delivering scalable, enterprise-ready pandas that improve the productivity of data teams. She graduated with her Ph.D. from UC Berkeley RISE Lab in 2021, where she developed data science tools to accelerate insight discovery.

Datacast Episode 110: Wisdom in Building Data Infrastructure, Lessons from Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini

Datacast Episode 110: Wisdom in Building Data Infrastructure, Lessons from Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini

Chris Riccomini is an engineer, author, investor, and advisor. He has worked on infrastructure as an engineer and manager for about 15 years at PayPal, LinkedIn, and WePay. He was involved in open source as the original author of Apache Samza and an early contributor to Apache Airflow. He has also written a book with Dmitriy Ryaboy called The Missing README, a guide for software engineers. Lately, he has been investing in startups in the data space.