Datacast Episode 41: Effective Data Science with Eugene Yan

eugeneyan_200px.png

The 41st episode of Datacast is my conversation with Eugene Yan — an Applied Scientist at Amazon. Give it a listen to hear about his educational background in psychology and organizational behavior, his transition to Data Science at IBM, his experience moving up the data science ranks at Lazada Group, his online education via Georgia Tech, his work on machine learning at uCare.ai, his thoughts on agile/scrum development, culture, writing, and much more.

Eugene is a data scientist and writer. He works at the intersection of consumer data & tech to build machine learning systems to help customers and writes about effective data science, learning, and career. He's currently an Applied Scientist at Amazon, helping users read more and get more out of reading.

Listen to the show on: (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) Stitcher, (5) Overcast, and (6) iHeart Radio!

Key Takeaways

Below are highlights from my conversation with Eugene.

On Working at IBM

  • I joined IBM as a Data Analyst and moved across projects. I first built dashboards and insights for IBM’s supply-chain processes. I then worked on social media analytics, figuring out the social voice for a couple of IBM’s clients. I was also involved with anti-money laundering, helping the big banks to detect those patterns and coming up with combating strategy.

  • I transitioned to an internal Data Scientist role within IBM’s Workforce Analytics group (thanks to my background in Psychology). I forecasted which jobs would be demand and then used those predictions to build an internal recommendation engine for IBM.

On Working at Lazada Group

  • I presented my Kaggle project on product classification at a meetup in Singapore. Coincidentally, people at Lazada saw my talk and were working on a similar problem. They gave me a chance to join as a data scientist.

  • I can’t tell you how many A/B tests I conducted that lost so much money for Lazada, but my colleagues gave me opportunities to fix them. We had better customer experience and product ranking system after those failed A/B tests.

  • As a senior data scientist, I stopped putting too much investment into the technical details. Instead, I helped scale the team, increase the outputs, design the right practices, and mentor team members. I had many one-on-one meetings to understand people’s career aspirations and find the right projects that align with their motivations. I found pleasure looking at team morale, customer satisfaction, and employee retention.

  • As a Vice President of Data Science, I was given 9 months to facilitate the migration of Lazada’s tech stack into Alibaba Group’s platform (due to the acquisition). We had the language barrier, as the Alibaba team mostly conversed in Mandarin and most of my team members didn’t speak Mandarin. We worked the 9–9–6 work routine and flew to China twice a week to meet the deadline.

  • I had a lot of interviews with people in the data science team to understand how to intently design a strong culture. I looked up to Amazon and Netflix as examples. The 5 cultural practices that I emphasized include ownership, collaboration, communication, research and innovation, and impact.

On Pursuing an MS Degree at Georgia Tech

  • I suffered from chronic imposter syndrome. While I learned a lot about data science and programming on my own via Coursera, I felt like I was lacking the fundamentals and the structured format of schooling. Georgia Tech is really cheap, the professors are world-class, and the program could be completed part-time.

  • In order to study and work simultaneously, I allocated 20–40 hours per week (mostly on weekday nights and weekends) for the lectures and assignments. The video lectures are on-demand, so I can watch them asynchronously. Class participation happened in Slack and forum, which I found to be even better than in-person classrooms.

On Working at uCare.AI

  • In healthcare, if you use data science to recommend a treatment, suggest a diagnosis, or estimate a hospital bill, your customers want to know the reasons leading to that output. There is a higher bar for model explainability and false positives outcomes.

  • The financial incentive for data science in healthcare is not yet mature. It is very difficult to change patient behavior and reduce the financial burden for the healthcare system.

On Working at Amazon

  • In 2019, I wanted to get out of my comfort zone and actively applied for overseas roles. In preparing for interviews, I had some struggle with the live coding component, so I used Leetcode and HackerRank for practice. I had a lot of interviews at odd hours back in Singapore and had to fly overseas to do onsite interviews.

  • At Amazon, I am working on Kindle to help people find books easier via recommendation engines and understand user intent via natural language processing.

On Data Science + Agile Development

  • It’s important to iterate fast and time-box projects. I rather build a product that may work or may not work in two months, rather than spending two years on it.

  • It’s important to prioritize roadmap based on business needs. Don’t build something sexy and hope that customers will like it. Start from the problem, then find the solution.

  • Demos are super fun. People work fast, are eager to share the results, and appreciate feedback for their work.

  • The retrospective is a process that includes what went well, what didn’t go well, and what was puzzling.

On Maintaining Models In Production

  • In my experience, it’s not difficult to deploy models in production, thanks to the available tooling. I’m not seeing much discussion on how to maintain models after deployment and minimize the operational burden.

  • Check the basic statistics of input and output distributions. Make sure to validate model performance using a test set. Set up a UI-friendly tool for the data scientists to interact with.

On Writing

  • From my conversations with other data scientists, communication is the most important attribute of effective data scientists.

  • Writing doesn’t actually start at writing, but it starts during the process of consuming information and reading materials. Writing is about collecting notes and tidying up the information to turn them into a blog post.

  • I use a tool called Roam to take literature notes. Each note includes relevant tags. When I start writing, I collect those notes and use them to craft the piece.

Show Notes

His Contact Info

His Recommended Resources