Datacast Episode 40: Biological Aging, Probabilistic Programming, and Private Machine Learning with Matthew McAteer

The 40th episode of Datacast is my conversation with Matthew McAteer — a Machine learning Researcher at FOR.ai. Give it a listen to hear about his interest in biological aging, his experience freelancing as a Machine Learning Engineer, his side project on private Machine Learning called De-Moloch, his research at UnifyID and FOR.ai, his popular writings on Machine Learning Interviews and Machine Learning Technical Debt, his collection of under-investigated scientific fields, and much more.

Matthew McAteer is a Machine learning Researcher at FOR.ai. Before this, he got his career started in biological aging, before moving on to the mission of figuring out ways in which machine learning could be used on large amounts of noisy biomedical data.

Key Takeaways

Below are highlights from my conversation with Matthew.

On His Interest In Biology

  • As long as I can remember, I always wanted to be some kind of scientist. When I was in elementary school, I was reading the biology textbooks that I had gotten from yard sales and the library near my house.

  • When I was in middle school, I started helping out at a local wildlife hospital. I ended up helping out the actual medical boards with animals endangered. I think seeing this side of how the different treatments are prepared and the biochemical underpinnings of various diseases got me interested in molecular biology.

On Studying at Brown

  • I had a non-standard Brown experience. I dropped out for about a year and a half. This was in part due to concerns about how I would be able to afford tuition. I dropped out, did some software work at one of the robotics labs at MIT (not to mention testing out a few of my own company ideas). Then when I had made more, I returned to finish my degree requirements in 2 and a half years. I was about one semester away from also completing the requirements for a dual BS and Masters, but I didn’t have enough Bitcoin to cover the additional degree.

  • I picked Brown because of the sheer number of courses related to aging and regenerative medicine they offered, from stem cell biology to tissue engineering to even actual biology of aging classes. I loved working with fruit flies and finding out how different drugs could extend their healthy lifespans.

  • I also took some time to take a bunch of computational biology classes, including classes on what eventually became known as Data Science (though it wasn’t called that back then).

On Being a Freelance Machine Learning Engineer

  • It was handy to see so many different applications of machine learning, including privacy and security, music analysis, and secure communications. I think freelancing makes me better at Machine Learning than competing on Kaggle or taking MOOCs.

  • I also learned to deploy models to the web, mobile, and even Raspberry Pi devices.

On Contributing to OpenMined

  • I was involved a lot in the earlier days of OpenMined. This was back when the primary focus of OpenMined was federated machine learning. This involves splitting up models among multiple nodes in a network, and rather than aggregating the data in one place, the models are trained on-device. Then the model parameters are aggregated to create one generalizable one.

  • I was mainly focusing on tools for actually providing data to the nodes from sources like social media, as well as making sure the schemas prevented the information from being identified. This was also back in the days when the 2017-era crypto hype was at its peak, so there was even more focus on getting some resource-management token to be able to coordinate communication between nodes on a distributed network.

On Building De-Moloch

  • The idea behind De-Moloch was to combine the textual and visual interface, where one could be recommended what privacy-preserving techniques to use. The tagline was “being able to do private machine learning without a Ph.D.”

  • It received a lot of attention from groups like Pioneer and Backend Capital. Getting the price money from Pioneer and additional AWS credits help a lot.

On Becoming a Machine Learning Engineer

  • If you try to learn a topic like Machine Learning from scratch, make a learning plan, and find mentors to help you.

  • When it comes to getting a job as a Machine Learning Engineer, ultimately, it comes down to an impressive portfolio and the ability to identify scenarios where machine learning is applicable (not just a technically exciting project to work on).

  • Good habits in terms of learning and productivity are generally applicable to master any subjects.

On Machine Learning Research Interviews

  • Over the years, I had been in a lot of very strange machine learning engineer interviews. General-purpose software engineering has been quasi-standardized to the point where it’s cargo-culting the Google interview process. By contrast, there are very few good examples of how to do machine learning interviews. A lot of interviews have surprisingly superficial questions regarding fundamentals.

  • With that in mind, I decided to try putting together a guide on the kinds of questions I would use if I were interviewing someone to be a machine learning engineer or even a researcher. I think the research-focus was sorely needed because a lot of machine learning engineers are doing the same thing, but the researchers and scientists need to have a sound theoretical grasp of the math they’re using.

  • Initially, this guide included only the math concepts in linear algebra and control theory. It later extended from math to theoretical foundations of deep learning and non-deep learning (for example, tools like support vector machines or general concepts about performance measurement). Later came the edition, which emphasizes system design for machine learning. Again, some interviews focus so much on LeetCode style algorithms, but not how to build and deploy a machine learning system. This latter one came from notes on popular deployment frameworks, as well as drawn-out responses to some of the system design questions I had seen at companies like Google and Facebook.

On Under-Investigated Academic Fields

  • Over the past nearly two years, I’ve been part of a group of friends that seems to talk a lot about the advancement of society and the advancement of science. I’ve been part of a bunch of groups focusing on the development of non-standard company ideas. After some time, I began to wonder what fields we should emphasize research if we do want to see more advancement as a society.

  • There are areas like GAN that are over-investigated. They are easier to draw hype around to get grant support but severely lack the potential impact.

  • I’ve personally biased towards the biology space: cryobiology, immortal model organisms, or biological radiation resistance.

FOR.ai

FOR.ai

Show Notes

  • (2:22) Matthew shared his childhood growing up interested in the field of biology.

  • (5:29) Matthew described his undergraduate experience studying Cellular and Molecular Biology at Brown University. He dropped out for a year and a half to work at MIT and test out a few company ideas in the biotech space.

  • (8:13) Matthew spent a decent amount of time in biological aging research after that, working at the Karp Lab at MIT and the Backsai Lab in Massachusetts General Hospital.

  • (13:28) Matthew recalled the story of how he switched his pursuit to a career in Machine Learning.

  • (17:14) Matthew commented on his experience as a Machine Learning Engineer freelancer on various projects in privacy and security, music analysis, and secure communications.

  • (20:36) Matthew discussed the opportunity to work with Google as a contract software developer and shared valuable lessons from contributing to the TensorFlow Probability library for probabilistic reasoning and statistical analysis.

  • (23:48) Matthew gave a quick overview of Bayesian Neural Networks (read his blog post for more details).

  • (27:18) Matthew went over his contribution to the open-source community OpenMined, whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies.

  • (32:29) Matthew worked on De-Moloch in late 2018, described to be “software that lets anyone easily run AI algorithms on sensitive data without it being personally identifiable” (read his blog post “Private ML Explained in 5 Levels of Complexity” for a complete description).

  • (36:17) Matthew unpacked his post “Private ML Marketplaces” — which summarizes and discusses various approaches previously proposed in this space such as smart contracts, data encryption/transformation/approximation, and federated learning.

  • (39:45) Matthew shared his experience competing in the Pioneer Tournament.

  • (42:19) Matthew shared brief advice on how to become a Machine Learning Engineer. For the full details, read his mega-post “Lessons from becoming an ML engineer in 12 months, without a CS or Math degree.”

  • (45:16) Matthew described his experience working as a Machine Learning Engineer at UnifyID, a startup that is building a revolutionary identity platform based on implicit passwordless authentication.

  • (47:52) Matthew unpacked his research paper “Model Weight Theft with Just Noise Inputs: The Curious Case of the Petulant Attacker” at UnifyID. The paper explores the scenarios under which an attacker can steal the weights of a convolutional neural network whose architecture is already known.

  • (51:55) Matthew is currently doing research with FOR.ai, a multi-disciplinary team of scientists and engineers who like researching for fun.

  • (54:14) Matthew unpacked his research at FOR.ai, namely “Optimal Brain Damage” and “BitTensor: An Intermodel Intelligence Measure.”

  • (01:00:52) Matthew shared key takeaways from attending academic conferences such as ICML 2019 and NeurIPS 2019.

  • (01:03:45) Matthew unpacked his 4-part series on ML Research interview that targets aspiring ML engineers, hiring managers/senior ML engineers, and people navigating ML research that don’t want to lose sight of first principles.

  • (01:07:09) Matthew unpacked his fantastic post called “Nitpicking ML Technical Debt” that breaks down relevant points of Google’s famous paper on Hidden Technical Debt.

  • (01:10:49) Matthew unpacked his well-researched list that examines the under-investigated fields in 10 academic domains ranging from computer science and biology to economics and philosophy.

  • (01:14:41) Closing segment.

His Contact Info

His Recommended Resources