Data Cleaning

Datacast Episode 118: Overcoming Hardships, Confident Learning, Dataset Improvement, and The Ph.D. Rapper with Curtis Northcutt

Datacast Episode 118: Overcoming Hardships, Confident Learning, Dataset Improvement, and The Ph.D. Rapper with Curtis Northcutt

Curtis Northcutt is an American computer scientist and entrepreneur focusing on AI to empower people. He is the CEO and Co-Founder of Cleanlab, building next-generation data-centric AI and open-source technologies that enable AI to work with real-world, messy data.

He completed his Ph.D. at MIT, where he invented confident learning to automatically find label issues in any dataset. Curtis received the MIT thesis award, NSF Fellowship, and Goldwater Scholarship for his work. Before Cleanlab, he worked in AI research teams at Google, Oculus, Amazon, Facebook, Microsoft, and NASA.

An Introduction to Big Data: Data Cleaning

An Introduction to Big Data: Data Cleaning

This semester, I’m taking a graduate course called Introduction to Big Data. It provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. In an effort to open-source this knowledge to the wider data science community, I will recap the materials I will learn from the class in Medium. Having a solid understanding of the basic concepts, policies, and mechanisms for big data exploration and data mining is crucial if you want to build end-to-end data science projects.