Datacast Episode 36: Machine Learning Bookcamp with Alexey Grigorev
The 36th episode of Datacast is my conversation with Alexey Grigorev — a lead data scientist at OLX Group. Give it a listen to hear about his early career working as a Java developer, his graduate work in Business Intelligence, his experience participating in Kaggle and academic competitions, his wide-ranging work building large-scale ML systems at OLX, his book “Machine Learning Bookcamp,” data science interviews, the tech scene in Berlin, and much more.
Alexey Grigorev lives in Berlin with his wife and son. He’s a software engineer with a focus on machine learning, currently working at OLX Group as a Lead Data Scientist. Alexey is a Kaggle master, and he wrote a couple of books. One of them is “Mastering Java for Data Science,” and now he’s working on another one — “Machine Learning Bookcamp.”
Show Notes
(2:00) Alexey studied Information Systems and Technologies from a local university in his hometown in eastern Russia.
(4:54) Alexey commented on his experience working as a Java developer in the first three years after college in Russia and Poland, along with his initial exposure to Machine Learning thanks to Coursera.
(7:55) Alexey talked about his decision to pursue the IT4BI Master Program specializing in Large-Scale Business Intelligence in 2013.
(9:42) Alexey discussed his time working as a Research Assistant on Apache Flink at the DIMA Group at TU Berlin.
(12:28) Alexey’s Master Thesis is called Semantification of Identifiers in Mathematics for Better Math Information Retrieval, which was later presented at the SIGIR conference on R&D in Information Retrieval in 2016.
(14:35) Alexey discussed his first job as a Data Scientist at Searchmetrics — working on projects to help content marketers improve the SEO ranking for their articles.
(18:54) Alexey’s next role was with the ad-tech company Simplaex. There, he designed, developed, and maintained the ML infrastructure for processing 3+ billion events per day with 100+ million unique daily users — working with tools like Spark for data engineering tasks.
(22:17) Alexey reflected on his journey participating in Kaggle competitions.
(25:35) Alexey also participated in other competitions at academic conferences: winning 2nd place at the Web Search and Data Mining 2017 challenge on Vandalism Detection and winning 1st place at the NIPS 2017 challenge on Ad Placement.
(29:59) Alexey authored his first book called Mastering Java for Data Science, which teaches readers how to create data science applications with Java.
(31:40) Alexey then transitioned to a Data Scientist role at OLX Group, a global marketplace for online classified advertisements.
(33:23) Alexey explained the ML system that detects duplicates of images submitted to the OLX marketplace, which he presented at PyData Berlin 2019. Read his two-part blog series: The first post presents a two-step framework for duplicate detection, and the second post explains how his team served and deployed this framework at scale.
(38:12) Alexey was recently involved in building an infrastructure for serving image models at OLX. Read his two-part blog series on this evolution of image model serving at OLX, including the transition from AWS SageMaker to Kubernetes for model deployment, as well as the utilization of AWS Athena and MXNet for design simplification.
(42:39) Alexey is in the process of writing a technical book called Machine Learning Bookcamp — which encourages readers to learn machine learning by doing projects.
(46:17) Alexey discussed common struggles during data science interviews, referring to his talk on Getting a Data Science Job.
(48:32) Alexey has put together a neat GitHub page that includes both theoretical and technical questions for people who are preparing for interviews.
(52:19) Alexey extrapolated on the steps needed to become a better data scientist, in conjunction with his LinkedIn post a while back.
(56:40) Alexey gave his advice for software engineers looking to transition into data science.
(58:32) Alexey shared his opinion on the data science community in Berlin.
(01:01:53) Closing segment.
His Contact Info
His Recommended Resources
“Designing Data-Intensive Applications” by Martin Kleppmann
Machine Learning Bookcamp
Permanent 40$ discount code: poddcast19
5 free eBook codes (each good for one sample of the book): mlbdrt-D452, mlbdrt-5922, mlbdrt-2C4D, mlbdrt-3034, mlbdrt-1DD1