Spark

Fugue - Reducing Spark Developer Friction

Fugue - Reducing Spark Developer Friction

This is a guest article written by Han Wang and Kevin Kho, in collaboration with James Le. Han is a Staff Machine Learning Engineer at Lyft, where he serves as a Tech Lead of the ML Platform. He is also the founder of the Fugue Project. Kevin is an Open Source Engineer at Prefect, a workflow orchestration framework, and a contributor to Fugue. Opinions presented are their own and not the views of their employers.

Datacast Episode 55: Making Apache Spark Developer-Friendly and Cost-Effective with Jean-Yves Stephan

Datacast Episode 55: Making Apache Spark Developer-Friendly and Cost-Effective with Jean-Yves Stephan

Jean-Yves (or "J-Y") Stephan is the CEO & Co-Founder of Data Mechanics, a Y-Combinator-backed startup building a data engineering platform that makes Apache Spark more developer-friendly and more cost-effective. Before Data Mechanics, he was a software engineer at Databricks, the unified analytics platform created by Apache Spark's founders. JY did his undergraduate studies in Computer Science & Applied Math at Ecole Polytechnique (Paris, France) before pursuing a Masters at Stanford in Management Science & Engineering.

What I Learned From Attending #SparkAISummit 2020

What I Learned From Attending #SparkAISummit 2020

One of the best virtual conferences that I attended over the summer is Spark + AI Summit 2020, which delivers a one-stop-shop for developers, data scientists, and tech executives seeking to apply the best data and AI tools to build innovative products. I learned a ton of practical knowledge: new developments in Apache Spark, Delta Lake, and MLflow; best practices to manage the ML lifecycle, tips for building reliable data pipelines at scale; latest advancements in popular frameworks; and real-world use cases for AI.

An Introduction to Big Data: Distributed Data Processing

An Introduction to Big Data: Distributed Data Processing

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.