Distributed System

Datacast Episode 121: High-Performance Processing Engine, Modern Data Streaming, and Propelling Minority in Tech with Alex Gallego

Datacast Episode 121: High-Performance Processing Engine, Modern Data Streaming, and Propelling Minority in Tech with Alex Gallego

Alexander Gallego is the founder and CEO of Redpanda Data, a high-performance, Apache Kafka-compatible data streaming platform for mission-critical workloads. He has spent his career immersed in deeply technical environments and is passionate about finding and building solutions to the challenges of modern data streaming.

Before Redpanda, Alex was a principal engineer at Akamai and the co-founder and CTO of Concord.io, a high-performance stream-processing engine acquired by Akamai in 2016. He has also engineered software at Factset Research Systems, Forex Capital Markets, and Yieldmo; and holds a bachelor’s degree in computer science and cryptography from NYU.

Datacast Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar

Datacast Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar

Vinoth Chandar is the creator and PMC chair of the Apache Hudi project, a seasoned distributed systems/database engineer, and a dedicated entrepreneur. He has deep experience with databases, distributed systems, and data systems at the planet scale, strengthened through his work at Oracle, Linkedin, Uber, and Confluent.

During his time at Uber, he created Hudi, which pioneered transactional data lakes as we know them today, to solve unique speed and scale needs for Uber’s massive data platform. Most recently, Vinoth founded Onehouse - a cloud-native managed lakehouse to make data lakes easier, faster, and cheaper.

Datacast Episode 112: Distributed Systems Research, The Philosophy of Computational Complexity, and Modern Streaming Database with Arjun Narayan

Datacast Episode 112: Distributed Systems Research, The Philosophy of Computational Complexity, and Modern Streaming Database with Arjun Narayan

Arjun Narayan is the co-founder and CEO of Materialize. Materialize is a streaming database for real-time applications and analytics, built on top of a next-generation stream processor – Timely Dataflow. He was previously an engineer at Cockroach Labs and held a Ph.D. in Computer Science from the University of Pennsylvania.

Fugue - Reducing Spark Developer Friction

Fugue - Reducing Spark Developer Friction

This is a guest article written by Han Wang and Kevin Kho, in collaboration with James Le. Han is a Staff Machine Learning Engineer at Lyft, where he serves as a Tech Lead of the ML Platform. He is also the founder of the Fugue Project. Kevin is an Open Source Engineer at Prefect, a workflow orchestration framework, and a contributor to Fugue. Opinions presented are their own and not the views of their employers.

Datacast Episode 68: Threat Intelligence, Venture Stamina, and Data Investing with Sarah Catanzaro

Datacast Episode 68: Threat Intelligence, Venture Stamina, and Data Investing with Sarah Catanzaro

Sarah Catanzaro is a Partner at Amplify Partners, where she focuses on investing in and advising high potential startups in machine intelligence, data management, and distributed systems. Her investments at Amplify include startups like RunwayML, Maze Design, OctoML, and Metaphor Data, among others. Sarah also has several years of experience defining data strategy and leading data science teams at startups and in the defense/intelligence sector, including roles at Mattermark, Palantir, Cyveillance, and the Center for Advanced Defense Studies.

Datacast Episode 58: Deep Learning Meets Distributed Systems with Jim Dowling

Datacast Episode 58: Deep Learning Meets Distributed Systems with Jim Dowling

Jim Dowling is the CEO of Logical Clocks AB, an Associate Professor at KTH Royal Institute of Technology, and a Senior Researcher at SICS RISE in Stockholm. His research concentrates on building systems support for machine learning at scale. He is the lead architect of Hops Hadoop, the world's fastest and most scalable Hadoop distribution and only Hadoop platform with support for GPUs as a resource. He is also a regular speaker at Big Data and AI industry conferences.

Datacast Episode 52: Graph Databases In Action with Dave Bechberger

Datacast Episode 52: Graph Databases In Action with Dave Bechberger

Dave Bechberger is known for his expertise in distributed data architecture and being a Graph Database SME.  He is known for his pragmatic approach to data architectures and for implementing large-scale distributed data architectures for big data analysis and data science workflows using various SQL and NoSQL data technologies. He is the author of "Graph Database in Action" by Manning publications and has spoken both nationally and internationally at conferences on subjects related to distributed data and graph databases.


Dave spent 20+ years developing, managing, and consulting on software projects and is currently a member of the Amazon Neptune service team. He works with both customers and engineering teams to simplify and speed the adoption of graph technologies.

What I Learned From Attending #SparkAISummit 2020

What I Learned From Attending #SparkAISummit 2020

One of the best virtual conferences that I attended over the summer is Spark + AI Summit 2020, which delivers a one-stop-shop for developers, data scientists, and tech executives seeking to apply the best data and AI tools to build innovative products. I learned a ton of practical knowledge: new developments in Apache Spark, Delta Lake, and MLflow; best practices to manage the ML lifecycle, tips for building reliable data pipelines at scale; latest advancements in popular frameworks; and real-world use cases for AI.