The 71st episode of Datacast is my conversation with Saishruthi Swaminathan — an Advisory Data Scientist at IBM’s AI Strategy and Innovation division. Previously, she was a technical lead and data scientist in the IBM Center for Open-Source Data and AI Technologies team.
Our wide-ranging conversation touches on her childhood and education in India, her transition from electrical engineering to data science, her work at IBM developing and evangelizing open-source software, the current state and the future of responsible AI, public speaking, online teaching, and much more.
Please enjoy my conversation with Saishruthi!
Listen to the show on (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) TuneIn, (5) RadioPublic, and (6) Stitcher.
Key Takeaways
Here are highlights from my conversation with Saishruthi:
On Growing Up in Rural India
I was born in a rural town in the south of India. My dad worked in the Indian posted service, while my mom managed the house and the office. We used to live in a 400-square feet house. Until high school, I literally had no Internet connection at home. I used to visit the local library to read books and newspapers. I used to pay 30 Indian rupees for 15 minutes of Internet connection, just to get a feel of using the computer.
The experience with less technology and more nature actually grew me into the person I am today. My parents made sure that I only heard things that elevated my thoughts and made me a better person. They gave me the freedom while providing a shell around me so that I did not experience too much social pressure. When I moved to the city for university, I could mold myself as a better person.
On Enjoying Programming
After university, I landed a job as a system engineer at Tata Consulting Services. On my first day at the job, I got spec and asked to debug 10,000 lines of production COBOL code. My hands were literally trembling. I didn’t even understand how to get started.
The next day, I was sitting next to my tech lead. He was typing intensely at his keyboard, and I actually wanted to get the same sound. I was so addicted to it that I wanted to get the same sound (even today). Thus, I started enjoying programming. That was the moment that turned my fear into something I enjoyed. I started typing and learning fast, making me more comfortable with programming. I was at Tata for 2 years. At the end of my tenure there, I was the top programmer of my unit, which handled 10 high-priority codebases with over 15,000 lines of COBOL code.
On Getting Into Data Science
I encountered data science during an internship in Seattle. I love that I was able to make data speak to me in which I understand the language. My interest in programming and my passion for innovation intersects. From there on, I was fortunate to take courses like data mining, statistical ML, neural networks, and probability under the finest professors in San Jose State. I also participated in various research projects and other small projects, from smart city to material strength prediction. In the 1.5 years of my Master’s, the amount of data science learning I had gone exponentially. I used to sleep 2 to 3 hours per day. That’s because I enjoyed learning, so I didn’t feel tired at all.
On Public Speaking
Your language or fluency does not matter. Instead, try to:
Be authentic. For all of my talks, I have never reused old content. I always create a specific talk for a specific audience in a way that they can grasp. Even if one person attends my talk, I still present it. I started with as little as a 10-people audience, and recently I delivered an Ethical AI talk to a crowd of 8.5K people.
Know your audience and prepare accordingly. It’s a privilege to stand before them and get their time/attention. It’s my responsibility as a speaker to make good use of their time.
Present in a simple way. I always break down high-level concepts to simpler levels by connecting them to day-to-day examples.
Accept criticism. I’ve been in situations where people stopped me in the middle of my presentations and gave critical opinions. Sometimes it was hard, but I learned to listen to them over time.
On IBM’s CODAIT
My team is called the Center for Open Source Data and AI Technologies. We are a group of 30+ developers and data scientists around the world. The common goal is to democratize AI, making this technology accessible to everyone.
The core AI tech relies heavily on open-source software. Our team improves those frameworks that help develop AI solutions — making the individual components work better as they are integrated into the pipeline.
We have committers, contributors, and maintainers of frameworks such as PyTorch, TensorFlow, and Spark.
We created our own open-source frameworks such as AI Fairness 360 (detecting and mitigating bias), AI Explainability 360 (making models explainable), Adversarial Robustness Toolkit (protecting models against attacks), Model Asset Exchange (making models as micro-services), and Data Asset Exchange (making data from IBM research available to enterprise users).
On Responsible AI
This is Saishruthi’s personal view, given in her talk titled “Digital Discrimination: Cognitive Bias in Machine Learning.”
A lot of people are not aware of cognitive biases (technical or non-technical). I showed examples of the impact of bias on real-world scenarios. More specifically, I showed how people were affected by biased systems and suffered from hidden pains.
I talked about the major cultural changes required within an organization. These changes do not just depend on the data scientists. They must touch every person involved in the project. There is a need for an ethical board in every organization.
I introduced the open-source tools that help tackle bias, explainability, robustness, and so on.
On Trusted AI
Concerns about privacy and responsible AI will be a major topic in the upcoming years, so businesses will be ready to adopt them. As I mentioned before, it’s not just about the tools. It’s also about the cultural change at the organizational level, like having an ethical board. Ethical experts will be involved in data science projects to inform about the governance of data privacy rules. If you use PII based on certain countries, you need to be aware of data regulation rules. Overall, it will be about building, evaluating, and monitoring model performance, such that these models are ethical and responsible.
Furthermore, there will be an increasing amount of research on fairness, robustness, value alignment, transparency, privacy, explainability, and accountability of the ML system.
On Online Teaching
There are a lot of online courses out there. As an instructor, I needed to show something unique. People enrolled in my courses should be able to get hands-on practice.
We designed materials for the course that took about 2 to 3 weeks. We sat together (as a team of 10) to brainstorm and develop the course syllabus. Next, we created video presentations. These videos then went through review and were finally pushed live.
Regarding the lab aspect of the courses, I wanted them to be fully hands-on. I expected the learners to write code, not simply running the cells on a notebook. I created a story for each lab assignment. These lab assignments also went through a review for approval.
The whole process took months from start to finish.
Additionally, I tend to give too much information. So I need to be mindful of the difficulty level for the course notebooks. Each learner has their own expectations from the course. As a result, I need to streamline these expectations appropriately.
Timestamps
(01:59) Saishruthi talked about her upbringing, growing up in a rural town in India with no Internet connection and no computers.
(05:50) Saishruthi discussed her undergraduate studying Electrical Engineering at Sri Sairam Engineering College in the early 2010s.
(11:56) Saishruthi mentioned the projects and learnings during her two years working at Tata Consultancy Services as an instrumentation engineer.
(15:57) Saishruthi went over her MS degree in Electrical Engineering at San Jose State University and her journey into data science.
(22:20) Saishruthi shared the initial hurdles she faced transitioning back to school and assimilating to the US culture.
(26:10) Saishruthi touched on her work with San Jose City on disaster management.
(28:20) Saishruthi went over her job search process, eventually landing a data science position at IBM.
(32:16) Saishruthi unpacked lessons learned from public speaking.
(35:20) Saishruthi summarized IBM’s data science and machine learning initiatives.
(37:02) Saishruthi brought up various projects happening at IBM’s Center for Open Source Data and AI Technologies, whose mission is to make open-source AI models dramatically easier to create, deploy, and manage in the enterprise.
(39:40) Saishruthi unpacked the qualities needed to contribute to open-source projects and their role in shaping the development of ML technologies.
(44:50) Saishruthi dissected examples of bias in ML, identified solutions to combat unwanted bias, and presented tools for that (as delivered in her talk titled “Digital Discrimination: Cognitive Bias in Machine Learning”).
(49:12) Saishruthi shared her thoughts on the evolution of research and applications within the Trusted AI landscape.
(54:07) Saishruthi discussed the core value propositions of IBM’s Elyra, a set of AI-centric extensions to JupyterLab that aims to help data practitioners deal with the complexities of the model development lifecycle.
(56:11) Saishruthi briefly shared the challenges with developing Coursera courses on data visualization with Python and with R.
(01:00:47) Saishruthi went over her passion for movements such as Women In Tech and Girls Who Code.
(01:03:27) Saishruthi shared details about her initiative to bring education to rural children.
(01:06:36) Closing segment.
Saishruthi’s Contact Info
Mentioned Content
Talks
“Digital Discrimination: Cognitive Bias in Machine Learning” (All Things Open 2020)
Projects
Courses
About the show
Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.
Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.
Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:
If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.