Datacast Episode 91: Collaborative Data Workspace, The Sharing Gap, and Engineering Management with Caitlin Colgrove
The 91st episode of Datacast is my conversation with Caitlin Colgrove, the Co-Founder and CTO of Hex, a collaborative data workspace for building and sharing data projects using SQL and Python.
Our wide-ranging conversation touches on her Computer Science education at Stanford; her 6-year engineering career at Palantir; her stint as a Data Engineering Manager at Remix; her current journey with Hex building a Data Workspace for teams; lessons learned serving the “analytically technical”, addressing the sharing gap, narrowing down user profiles, hiring for ownership, fundraising from complementary investors; and much more.
Please enjoy my conversation with Caitlin!
Listen to the show on (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) TuneIn, (5) iHeartRadio, and (6) RadioPublic
Key Takeaways
Here are the highlights from my conversation with Caitlin:
On Studying at Stanford
Stanford was great for many reasons, but I think the most valuable thing they do in their CS curriculum is that they give you a lot of hard projects with teams, where you write a lot of code. Getting that experience is honestly the best preparation that you can get for going into the industry later. I liked the more theoretical classes like algorithms and compiler, which played to the side of my problem-solving endeavor.
For context, I was playing around with Python in 8th grade. My dad is a huge nerd too. For Christmas in 8th grade, he gave me a book on learning Python, which is not a typical 8th-grader’s present. It was actually awesome, and I spent all of Christmas break learning how to build a neo-pets game in Python. I did not realize what that would mean as a career until getting through a few Computer Science classes in college. It was only until late freshman year that I decided to commit fully to studying it.
I was the Teaching Assistant for CS 106A (the introductory programming class) and CS 103 (the introductory CS theory class). Sometimes, in tech, you get a bit of elitism around people who can code vs. people who cannot code. One thing I took away from teaching different intro CS classes was that people of all stripes and backgrounds could learn to code if they have the right environment. Stanford has a fantastic introduction to the CS curriculum, and this is one thing that we are seeing born out in the data and analytics space today. As analytics become more code-driven, you have this whole big group of people who, 10 years ago, might have been in Excel or point-and-click BI tools and now can write sophisticated SQL and Python code. That observation is a big part of what we wanted to build when we were starting Hex — what we called the “analytically technical” group of people is a core part of our product philosophy.
On Internship Lessons
Having been an engineer and a manager for a while, one thing that I have found about really effective engineers is that only a certain amount of engineering is about your raw technical ability. A lot more of it is about being effective inside teams/organizations and handling large codebases. One of the biggest skills I had to learn early on in an internship was how to get unstuck on things. In college, you end up with a lot of these problems which you have to solve by yourself, so you develop skills around how to do things by yourself. However, that is not true when you get into the industry. In the industry, you have different resources around you, so you want to ask for help on a technical question or collect more information in order to build this feature. Engineers who can do that really well just get a ton more done. I definitely struggled with it early on. Asking for help is very hard, but once I was able to do that, I felt like that really helped me level up as an engineer.
My Facebook internship was the first time operating under a fair amount of uncertainty. We were trying something fairly experimental, and I got halfway through my internship before realizing that maybe we were building the wrong thing and we needed to do something different. There was not a ton of time left to build the thing we were supposed to be building by that time. Reflecting on that particular project, I probably could have realized the issue a few weeks earlier and reached out to the designer/product manager more proactively. After the initial scoping, I should have been more willing to adjust the course if things were not working the way I thought they were.
On Her Journey at Palantir
Palantir was a phenomenal place to be early in my career. Besides the software, what attracted me to the company were the people and the mission-oriented problems. They were solving issues in the civic space. I got to work on legal discovery, disaster response, etc. From a product standpoint, it was compelling rather than working on another company serving ads. On the career side, Palantir is basically a bunch of loosely-related individual startups. Coming in early on, I got a lot of responsibility very quickly and the opportunity to learn not just the technical side of things but also product management / little mini go-to-market motions, etc. I did not realize it at the time, but in many ways, I was getting an introduction to what it felt like to start a company, just within the scaffolding of a much larger organization. This approach has been hugely successful for Palantir because it has produced much internal innovation as individual engineers and product managers are empowered to experiment with new initiatives. Many startups are coming out of the Palantir Alumni Network, largely due to this way of operating. Obviously, there were downsides, as Palantir could have used a bit more overall coordination on the engineering team. But from a career perspective, I learned so much in a very short time that I do not think I would have gotten in many places.
If you are familiar with the canon of software development, you might be familiar with something called the second-system effect, which basically says that anytime you try to rewrite a system to make it more modern, everyone tries to cram everything that they hate about the original system into the new version. As a result, the system becomes a big bloated mess. At the stage that I was at Palantir, we were maturing from basically a small, scrappy startup into a much bigger and more mature company. We had to do this several times, whether to modernize or better architect the things that needed to scale at this point. One of the big things that I took away from my time at Palantir is how to accomplish that effectively. When doing this, always make sure to deliver value. You should not build software because you like it from an engineering perspective. The other thing I learned is to build software incrementally in pieces so that you do not end up with a huge stack of things that you need to coordinate at the end. That can be very hard to do, drag out the process, and potentially prevent anything from ever shipping.
Transitioning to being a tech lead/manager, I learned to let go a little and make space for other people. When you are a good engineer and get promoted to become a leader for people who are more junior than you, your first instinct for many things is: “I could do this so much faster or better.” But you have to resist that instinct because you cannot do everything. That is not scalable and does not help anyone. Instead of doing things yourself, teach folks to do the things you can do. In the short term, it will not be perfect because people are learning. But in the longer term, you are going to have a much stronger team as people will go through professional growth themselves (which should be exactly your goal when you are a manager of people).
On Cultivating Product Intuition
Make sure that people are listening to the end-users and have a deep understanding of the pain points that the end-user has. Your team needs to understand what problems you are solving, what workflows you are addressing, and what the persona of your end users is. I most often see the context loss by the time feature requests make it to engineering. They come in as users/customers talk to customer success. Customer success talks to product managers. Product managers talks to engineers. By that point, the feature requests have morphed into a vacuum. It is hard to understand and make good product decisions because you have lost all the context. One thing we try to institute at Hex is ensuring that everyone on the team, including engineers, is getting direct customer contact to cure these issues firsthand.
Make sure that you are measuring whether your product is successful or not. You do not always have all the correct answers, so it is important to realize that is often the case and pay attention to how users use your product to validate your hypothesis. Most people who have worked in a product capacity for a long time in tech have had an experience of being really sure that something was a great idea, then shipping it and finding out that it was not the right thing after all. I find that to be a bit humbling.
On Joining Remix
I was really excited about the opportunity at Remix. After being at Palantir for a while, I wanted to try something at an earlier stage where I get to build something from the ground up. At Palantir, everything was built in-house. So it was a great technical opportunity for me to see what a modern data stack looks like outside of Palantir. At the same time, I get to build up the team, the processes, and the actual data pipeline product from scratch. That experience was super valuable in preparing me for striking out on my own.
The technical challenges of transportation data are fascinating and frustrating simultaneously.
There were no standards for dealing with geospatial data. We tried to build products for different cities, and every city has a different type of data in a different format. How to build a product that handles all of these different formats is challenging.
Transportation data has bad data quality problems inherent in them. For example, GPS data is notoriously inaccurate. You cannot tell if a bike is riding on the street or on the sidewalk. It just does not have that level of accuracy. GPS data also comes from moving vehicles, so you cannot tell precisely where it comes from. Solving that data quality challenges must come first before doing useful aggregate visualizations.
Palantir’s value to their clients is the knowledge to clean up their data and build maintainable data engineering pipelines. I have always been more on the product analytics side there, so moving to Remix was an excellent opportunity to get deeper on the data engineering side, which is the core part of a complete end-to-end data-driven operation. You cannot do analytics without data engineering.
On Engineering Management
I have been a manager for a bit now. Over that time, I have found a couple of crucial things you can provide uniquely as a manager to people.
One is coaching and professional growth. This one is well-known. You are the person who knows these people the best professionally, and it is your job to help them be the best possible version of themselves in that aspect.
The less obvious one that can make a huge difference in people’s careers is that as a manager, you are not just a coach but also an advocate. You are the person in the organization who is best positioned to set your reports up for success, whether getting them the resources they need or finding them a great next professional opportunity. Managers who do that well have much more successful and stronger teams.
Some of the most important things where managers can uniquely be helpful are on the non-technical sides, such as project management and communication. It can be helpful for junior engineers to get direct pointers on these skill sets. For the more senior engineers, a lot of it falls into listening and trying to understand what they want and enjoy, then helping them think through where they want to take their career forward. Many folks do not necessarily have a strong framework for understanding which career path they want to choose or even how to make that decision.
On Founding Hex
Barry, Glen, and I worked together in various capacities at Palantir and then went our separate ways into the real world. When we came back together a few years later, we realized that Hex is the product we have always wanted to build. We have spent our entire careers building and using analytics tools. It quickly became clear that there was a big hex-shaped gap that we felt we could uniquely fill based on some of our experiences.
Glen is one of the smartest people I have ever met. He is just a phenomenal engineer. I have always appreciated Barry how deeply he thinks about building teams and products — two essential qualities in a CEO. I do not think I have ever seen someone who has been good at both of those. That was ultimately what made me decide that this was the team to start a company with, and I will never get a better opportunity than that.
We have all started from a fairly similar product vision and experiences, so we are on the same page in many ways and can come to an agreement over time. But sometimes we do not. Sometimes we have different opinions. One thing I really appreciate about this founding team is the ability to disagree and commit: maybe you feel strongly about one thing, and I sort of disagree, but let’s try your thing and measure whether it is successful or not, and we can go from there. That way of measuring and being willing to experiment with things that I do not necessarily consider a priority has been super valuable. It has produced a lot of exciting and successful ideas that were not consensus among the founders and enabled a healthy working relationship between the three of us.
On The “Analytically Technical”
I have built a lot of analytics tools in my career and watched users use them.
One of the most common problems I have seen with low-code and no-code tools is that your most creative and dedicated users will quickly reach the limit of what the tool can do, which is super frustrating. You can lose a lot of these people when they run out of stuff that they are trying to build.
On the flip side, I see many tools for users to write code but have to think about other activities like packaging Docker images and hosting them on an AWS server. These tools allow good flexibility and power but make users think about tasks that are not core to the job.
At Hex, we try to solve both problems by allowing you to write code with flexibility and not think about deployment. Our product philosophy has been around having a low barrier of entry yet a high ceiling of capability.
On The Sharing Gap
We did a bunch of user research before getting into the development side of Hex. We saw a couple of common modes.
One was literally screenshotting charts and putting them in PowerPoint decks, which is a huge amount of work. You run these analyses, and when the stakeholder asks you to tweak certain parameters, you will have to do the whole process all over again.
The other one is called “Data Rube Goldberg Machine.” When we have a local notebook, we would do our analysis, push it up to the cloud, and run it on schedule. Then we would use that scheduled run to dump the data into a table in the data warehouse. Then we would have our local-mode model sitting on top of that. The stakeholder would be looking at it in wonder. This process works as people get a lot more access to the data, but it is really brittle and hard to maintain.
We were trying to address these sharing pain points when we started Hex. Right now, we solve them by taking your analysis and turning it into a narrative, interactive app that can be shared with one click instead of going through screenshotting. Hex maintains the underlying infrastructure, so you do not have to worry about having brittle sequences stapled together in order to get that fully interactive experience on the other end.
On Technical Challenges Building Hex
The first one is security. We allow users to write arbitrary code, which they would not necessarily get in a traditional BI tool. Normally, the goal of building a secure application is to prevent people from running arbitrary code. Obviously, that is a pretty core piece of a code notebook. Being able to do that in a way that is isolated from the rest of the Internet (so that users do not have unauthorized access to parts of our stack while still having all the Python code at their fingertips) has been an interesting engineering and security challenge that we will continue to invest in overtime. On that front, we want to encourage people to build more secure applications for themselves and tooling that answers questions like how to handle secrets appropriately, how to handle data connections securely, and how to build SQL without allowing yourself to be SQL-injected. That is the core of how we are building some of Hex’s product experience.
The second one (that people do not necessarily think about) is that the workflows for doing exploratory, iterative analysis are different at a technical level from the workflows for building a data application in production. We have had a lot of interesting technical stuff around fundamentally executing Python on the backend, taking a notebook built in this exploratory-editor way, and turning it into a stable web app. There are a bunch of things we have done under the hood in the kernel to make that a smoother process. But overall, it is still a big challenge that we are facing on the technical side.
On The Evolution of Notebooks
When I think about notebooks, there are two pieces that are fairly distinct and probably will go in different directions.
One is the notebook UX, where you have small pieces of logic (basically a ripple that you run pieces of your analysis) and build them up over time by re-running them quickly. That is a fantastic UX for a lot of analyst workflows and will not go away anytime soon. Many people love it for a good reason.
When people complain about notebooks, they complain mostly about the underlying implementation of how notebooks run. Maybe notebooks are memory-bound, so users cannot do big data operations like in a cloud environment. Or maybe notebooks have the hidden-state problem where you set a bunch of variables and run things out of order. All of a sudden, your notebook is in a weird state, and you do not know how to get out of it. There will be a lot of innovation here over the upcoming years.
Notebooks UI will be around for a while, while the underlying technology will definitely change and improve over time.
On Data Team ROI
Make sure that you understand the problem you are trying to solve and the success criteria because it might not always be exactly the same as the specific thing you have been asked for. This is important in terms of having that ROI in your investment. But I also think that ROI relies a lot on the work product that you produce. If you are building decks, the amount of value you can deliver is limited by the number of hours spent working on it, and there are only so many hours in the day. But if you are building interactive apps, the impact is the combined value of everyone who uses it, which can be 10x or 100x more than what you can do on your own. Thus, the ROI becomes a lot bigger when you are building these applications than when you are building a static work product.
There are many different modes data teams operate in, such as data-as-a-service or data-as-a-function. Looking at the practitioners whom we are building Hex for, we envision Hex becoming so integral to the core processes and workflows of the business. As Hex integrates into the operational side, it is much easier to justify the value our users provide rather than showcasing a few charts periodically.
On Hiring
We are solving such a common problem that, honestly, the best hiring tactic we have found has been to hire people who would say: “Hex would have made my life so much easier at my last job!” This is great for various reasons. Obviously, people are excited and motivated to be building this product that would have made their lives so much easier. But I think even more importantly, at Hex, we want everyone to have ownership and impact over the actual product in their vision. Having folks who have experienced that pain specifically in the past has led to a lot of amazing product ideas from all over the company, not just from the product side.
Many factors go into building a healthy and vibrant company culture. At Hex, we want to foster a collaborative experience — not just that the founders dictate what the culture must be, and everybody must subscribe to that. A while back, we did an exercise to define, as a team: What are our emergent values? What are our aspirational values? Where are we today? How does the team work, and where do we want to move that culture? You can see the results of that on our website.
If you are curious, we have our Hex values in our Hex’s handbook, which is an evolving document. Since then, we have added a lot of people, and people bring valuable new contributions to the culture. There are a couple of core things that we look for in folks today, but at the same time, culture is a living thing. Hiring people who care about that vibrant company culture is one of the most important things.
On Finding Customers
The biggest challenge for us was identifying the right user profile. We started quite broadly — anyone who uses Python to do any kind of data work, whether data analytics, data engineering, or data science. We discovered that the workflows in that category are very broad. While we identified some pain points, not everybody had those pain points because they were doing different tasks. So we spent some time focusing on: Who is the set of users with this set of problems? What are they trying to do? That is the group we honed in — analysts who are technical, write code, and are deeply underserved by the tools today. This process allowed us to go from having a broad product that was neat but was not solving a pain point for anyone, to actually having a core set of users who love using Hex.
We were right in our initial thesis about the collaboration and sharing pain points. Then we categorized people in our sales conversations into two groups: those who have that problem and those who do not find our solution very compelling. Starting from the shape of the company that is more likely to have this problem, we then looked at the roles of people in those companies. All of that helped us solidify the final specific user profile.
Our product offering is broad, and people do data analytics in many industries. But early on, we were not quite ready to go up-market (enterprise companies), but we needed to have companies that were big enough to have the sharing and collaboration problems. So there is the type of mid-sized companies that have been investing in their data teams, but their data teams are still small relative to the organization. They probably did not have big engineering teams to support their data teams, so they could not build things in-house. That is the shape of companies we narrowed in.
On Fundraising
There are two big things I have found helpful in figuring out how to work with investors:
New companies go through many ups and downs, so you want to ensure that the investor you work with is someone you trust to have your back when the chips are down (because they will be down at some point).
Find people who have a bit of a complementary perspective to you. For example, if you know that you tend to be a bit more risk-averse, find someone who can push you a bit more. If you are product-focused, find someone who glances you out on the go-to-market side. These people are not going to be experts in your business. Only you are the expert on your business. But often, they can provide a different perspective on some of the problems you are facing and insights you might not have thought of before (because you have a different background).
Those are two things we have looked for in our early investors, and I think that has played out well for us.
Show Notes
(01:37) Caitlin went over her college experience studying Computer Science at Stanford University in the early 2010s.
(03:55) Caitlin talked about her teaching experience for CS 106A and CS 103.
(07:09) Caitlin shared valuable lessons from completing software engineering internships at Harvard University, Facebook, and Palantir.
(10:06) Caitlin walked over technical and organizational challenges during her time at Palantir — building products for both government/commercial customers and working with designers/infrastructure engineers to deliver full-stack applications to the field.
(12:01) Caitlin explained why Palantir is composed of “loosely individual startups.”
(14:56) Caitlin recalled learning curves during her transition to a tech lead role at Palantir — becoming responsible for the technical architecture and code quality of the product, mentorship and growth of the engineers, and the product direction and prioritization of features.
(18:31) Caitlin discussed her time as a Data Engineering Manager at Remix Technologies — leading the team that builds geospatial data pipelines on top of AWS, Postgres/PostGIS, and Apache Airflow.
(24:45) Caitlin reflected on valuable leadership and people management lessons absorbed during her transition to growing and developing diverse and inclusive engineering teams.
(29:05) Caitlin shared the founding story of Hex, the modern data workspace for teams, alongside her co-founders Barry and Glen.
(32:58) Caitlin talked about Hex’s ideal users (the “analytically technical” who need better tools to access and manage more sophisticated workflows) and introduced Hex’s Logic View.
(35:22) Caitlin examined the collaboration challenges in data teams and revealed Hex’s Library to address some of the shortcomings.
(39:59) Caitlin shared her thoughts on the evolution of data science notebooks.
(42:14) Caitlin unpacked the nuanced problem of justifying data ROI to functional stakeholders and described Hex’s interactive App Builder.
(45:17) Caitlin shared exciting development in the horizon of Hex’s product roadmap.
(46:37) Caitlin shared valuable hiring lessons to attract the right people who are excited about Hex’s mission.
(52:10) Caitlin shared the hurdles to find the early design partners and lighthouse customers of Hex.
(56:01) Caitlin shared upcoming go-to-market initiatives that she’s most excited about for Hex.
(58:24) Caitlin shared fundraising advice for founders currently seeking the right investors for their startups.
(01:01:42) Closing segment.
Caitlin’s Contact Info
Hex’s Resources
Customers | Careers | Integrations | Pricing
Mentioned Content
Articles
“Long Live Code” (June 2020)
“Don’t Tell Your Data Team’s ROI Story” (Aug 2020)
“The Sharing Gap” (Oct 2020)
People
Tristan Handy (Founder and CEO of dbt Labs)
Claire Carroll (Product Manager of Hex, previous Community Manager of dbt Labs)
Wes McKinney (Creator of Pandas and Arrow, Co-Founder and CTO of Voltron Data)
DeVaris Brown (Co-Founder and CEO of Meroxa)
Book
“Mindset: The New Psychology of Success” (by Carol Dweck)
Notes
My conversation with Caitlin was recorded back in Fall 2021. Since then, many things have happened at Hex. I’d recommend looking at:
Caitlin’s piece announcing Hex’s SOC 2 Type II report to reflect Hex’s commitment to security
Caitlin’s recent talk at Data Council Austin about implementing reactive notebooks with iPython
The release of Hex Knowledge Library, a new way to publish and discover data work
Hex’s $16M Series A (led by Redpoint Ventures) and $52M Series B (led by a16z along with Snowflake, Databricks, and existing investors)
Hex’s increasing list of customers such as AngelList, Fivetran, Hightouch, Loom, Mixpanel, Notion, Ramp, Replicated, SeatGeek, etc.
About the show
Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.
Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.
Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:
If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.