Datacast Episode 7: Building Open-Source R Packages with Thomas Lin Pedersen

I’m creating a new podcast about Data Science called Datacast! The 7th episode is my conversation with Thomas Lin Pedersen, a software engineer at R Studio. Give it a listen to learn about his previous academic life as a bioinformatician, his prolific track of authoring open-source R packages, his recommended resources on network analysis, and many more.

Guest Bio

Thomas is a bioinformatician, turned software engineer, who enjoys developing tools for data scientists. His main interests are in the tools that bring the scientist closer to their data, whether it be through intuitive and powerful APIs or through visualization. He describes himself as a creative spirit who enjoys photography as well as generative art and graphic design, and he tends to try and combine this with his interest in programming whenever possible. Thomas lives just north of Copenhagen with his wife and two kids.

RStudio.png

Show Notes

  • (2:16) Thomas talked about the study of Food Science and Technology in which he focused on microbiology.

  • (3:15) Thomas stressed the importance of user empathy, something useful he gained from his degree.

  • (4:39) Thomas discussed the reason to pursue a Ph.D. in Bioinformatics at the Technical University of Denmark.

  • (6:10) Thomas talked in-depth about the tools he developed for his Ph.D. thesis, which are able to handle large-scale pangenome analyses using sequential data.

  • (9:11) Thomas talked about using the ggplot2 package for his R package “Find My Friends.”

  • (11:11) Thomas worked on the ggforce package, which aims at providing missing functionalities to ggplot2 during his internship at RStudio.

  • (13:34) Thomas recalled the best learning he got from his internship with RStudio.

  • (15:08) Thomas gave advice to people who want to contribute to open-source projects.

  • (18:57) Thomas shared the experience working on the package ggraph, also known as the grammar of graphics for relational data.

  • (22:02) Thomas discussed 2 other packages, tidygraph and particles, that he built to bring graph and network data into the tidyverse, the very popular collection of R packages designed for data science.

  • (25:37) Thomas provided resources for R users who want to learn more about network analysis and network visualization.

  • (27:05) Thomas went over his job as a data scientist at SKAT, where he handled all the advanced analytics going on in the Danish Tax Authorities.

  • (32:15) Thomas talked about the intuition behind working on patchwork, a package that can combine multiple ggplots in the same graphics.

  • (35:50) Thomas summarized his most recent projects, gganimate (a package that extends ggplot2 to include the description of animation) and tweenr (a package for interpolating data mainly for animations).

  • (40:47) Thomas discussed his current job as a software engineer at RStudio.

  • (43:04) Thomas gave his two cents on the Python and R comparison.

  • (45:53) Thomas talked about using Twitter to share his work, where he has more than 10,000 followers.

  • (47:07) Thomas went over something he works on during his spare time, generative art visualization (check his Instagram account!).

  • (49:09) Thomas gave some thoughts on the tech community in Copenhagen.

  • (51:09) Closing segments.

His Contact Info

His Recommended Resources