James Le

February 13, 2024

Datacast

Datacast Episode 132: Big Data Engineering, Data Culture from First Principles, and Reimagined Metadata with Suresh Srinivas

James Le

February 13, 2024

Datacast

Suresh Srinivas was the Chief Architect of Uber’s data platform, responsible for all data initiatives at the company, including the Databook, Data Quality, and Data Lineage initiatives. Suresh was part of the original team that built Hadoop at Yahoo! and co-founded Hortonworks, which developed and supported open-source software to manage big data and associated processing.

He is leading the OpenMetadata Project to build Metadata APIs & specifications and a single place to discover, collaborate, and get your data right.

James Le

August 5, 2021

Computer Science

Fugue - Reducing Spark Developer Friction

James Le

August 5, 2021

Computer Science

Fugue - Reducing Spark Developer Friction

This is a guest article written by Han Wang and Kevin Kho, in collaboration with James Le. Han is a Staff Machine Learning Engineer at Lyft, where he serves as a Tech Lead of the ML Platform. He is also the founder of the Fugue Project. Kevin is an Open Source Engineer at Prefect, a workflow orchestration framework, and a contributor to Fugue. Opinions presented are their own and not the views of their employers.

James Le

January 18, 2021

Datacast

Datacast Episode 52: Graph Databases In Action with Dave Bechberger

James Le

January 18, 2021

Datacast

Dave Bechberger is known for his expertise in distributed data architecture and being a Graph Database SME. He is known for his pragmatic approach to data architectures and for implementing large-scale distributed data architectures for big data analysis and data science workflows using various SQL and NoSQL data technologies. He is the author of "Graph Database in Action" by Manning publications and has spoken both nationally and internationally at conferences on subjects related to distributed data and graph databases.

Dave spent 20+ years developing, managing, and consulting on software projects and is currently a member of the Amazon Neptune service team. He works with both customers and engineering teams to simplify and speed the adoption of graph technologies.

James Le

October 29, 2020

Datacast

Datacast Episode 46: From Building Recommendation Systems To Teaching Online Courses with Frank Kane

James Le

October 29, 2020

Datacast

Frank Kane is the owner of Sundog Education, teaching machine learning and data science online to over 500,000 students worldwide. Before Sundog, Frank spent nine years at Amazon as a senior engineer and senior manager, specializing in recommender systems and running IMDb's engineering department. Frank also worked in the early days of video game development, dating back to the adventure games of Sierra Online in the early '90s, and has also developed computer graphics software for flight simulators and military simulators around the world. Today Frank is focused on the world of online education, living in the Orlando Florida area with his family.

James Le

October 5, 2020

Conference

What I Learned From Attending #SparkAISummit 2020

James Le

October 5, 2020

Conference

What I Learned From Attending #SparkAISummit 2020

One of the best virtual conferences that I attended over the summer is Spark + AI Summit 2020, which delivers a one-stop-shop for developers, data scientists, and tech executives seeking to apply the best data and AI tools to build innovative products. I learned a ton of practical knowledge: new developments in Apache Spark, Delta Lake, and MLflow; best practices to manage the ML lifecycle, tips for building reliable data pipelines at scale; latest advancements in popular frameworks; and real-world use cases for AI.

James Le

April 30, 2019

Computer Science

An Introduction to Big Data: Distributed Data Processing

James Le

April 30, 2019

Computer Science

An Introduction to Big Data: Distributed Data Processing

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.

James Le

April 11, 2019

Computer Science

An Introduction to Big Data: Clustering

James Le

April 11, 2019

Computer Science

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.

James Le

March 18, 2019

Computer Science

An Introduction to Big Data: Data Integration

James Le

March 18, 2019

Computer Science

An Introduction to Big Data: Data Integration

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.

James Le

February 15, 2019

Computer Science

An Introduction to Big Data: Data Querying

James Le

February 15, 2019

Computer Science

An Introduction to Big Data: Data Querying

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.

James Le

February 9, 2019

Datacast

Datacast Episode 9: Diving into Data Engineering with Mark Sellors

James Le

February 9, 2019

Datacast

Mark Sellors is the Head of Data Engineering at Mango Solutions, a UK based Data Science consultancy. He has more than a decade’s experience working with analytical computing environments, DevOps and Unix/Linux. He uses his experience to help Mango’s customers transform their analytic capabilities to ensure they can make the most of their data.