Hadoop

Datacast Episode 110: Wisdom in Building Data Infrastructure, Lessons from Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini

Datacast Episode 110: Wisdom in Building Data Infrastructure, Lessons from Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini

Chris Riccomini is an engineer, author, investor, and advisor. He has worked on infrastructure as an engineer and manager for about 15 years at PayPal, LinkedIn, and WePay. He was involved in open source as the original author of Apache Samza and an early contributor to Apache Airflow. He has also written a book with Dmitriy Ryaboy called The Missing README, a guide for software engineers. Lately, he has been investing in startups in the data space.

An Introduction to Big Data: Distributed Data Processing

An Introduction to Big Data: Distributed Data Processing

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.