The 114th episode of Datacast is my conversation with Carlos Aguilar, the CEO and Founder of Glean (now Hashboard), a data visualization startup based in New York City.
Our wide-ranging conversation touches on his education and Mechanical Engineering at Cornell, his early career working on robotics at Kiva Systems and Amazon, his time building data products to help cancer research at Flatiron Health, his current journey with Hashboard making data exploration and visualization accessible to everyone, lessons learned from hiring/selling/fundraising, and much more.
Please enjoy my conversation with Carlos!
Listen to the show on (1) Spotify, (2) Apple, (3) Google, (4) Stitcher, (5) RadioPublic, and (6) iHeartRadio
Key Takeaways
Here are the highlights from my conversation with Carlos:
On His Upbringing
I grew up in an exciting time as part of the first generation to have access to the Internet. As a child, I enjoyed playing with websites, viewing their source code, and creating my own AOL web pages. I learned how to copy code and later became interested in Flash, where I discovered that I could code and draw things in the same environment, creating little games and websites.
While in high school during the early 2000s, the dot-com bubble burst, and I became interested in robots. As I entered college, I assumed that the web was interesting, but I was more interested in the way it would touch the rest of the world. I wanted to explore how technology and programming could affect the real world.
Although I took a computer science course in high school, I had already been hacking around with ActionScript and Flash, which looks a lot like JavaScript. I also wrote a lot of PHP and was fascinated with creating little webpages, hosting domains, submitting websites, and Flash applications to competitions. While not altogether useful, it was a fun way to explore coding.
On His Academic Experience at Cornell
I chose to attend Cornell because of the access to arts and humanities classes, but it turned out to be an intense engineering program, which is what I wanted. I was looking for technical rigor and definitely found it at Cornell. My favorite classes were the ones where I got to work on computers, like finite element analysis, which involved computational modeling using programs like MATLAB.
My favorite class was feedback control systems, which taught systems thinking and creating computational models for how systems work. I also took some computer science classes, like evolutionary optimization and genetic algorithms, and some robotics classes.
In 2006 or 2007, I took an evolutionary computation class focused on genetic algorithms. Machine learning was less popular then than today, and neural nets were used less. I took the class with Hod Lipson, who had a research lab called the Creative Machines Lab. The lab explored how machines and technology could be creative, especially in fine art. We devised a system that would take an input image and create potential solutions for creating a painting of that image. These solutions would compete with each other using a fitness function and an evolutionary process, resulting in a representation of the image that was good according to some optimization function.
The most exciting results were when we constrained the system to represent the image in a limited number of strokes, resulting in creative solutions that even abused the limitations of the simulated environment. The project was an early exploration of how machines could be creative, and it was great to see the project continue even after I left. Hod continued working on the project for another five or ten years and created some really awesome things.
Although I have yet to keep up with the machine learning applications of art, I have seen recent advances like DALL-E, which creates visual representations. I've also seen a lot of digital art and generative art that combines human input and algorithms or machine learning. It's been really cool to see these things pop up. Seeing how far we've come in just 15 years is incredible.
On Working in Robotics at Kiva Systems
I can walk you through how I landed my job at Kiva after graduating from Cornell. Coming out of grad school, I wanted to work in tech, but as a mechanical engineer, the path to tech wasn't clear.
I considered companies like Google, but at that point, they were all pretty big, and I didn't feel like I'd be able to do the creative technical work I wanted. I was considering staying on as a Ph.D. with Hod in his lab.
It was really by chance that I found Kiva Systems, a company founded by a bunch of robotics folks from Cornell, including Raffaello D'Andrea, one of the co-founders of Kiva. They called me out of the blue and did a phone interview, asking if I wanted to become a systems analyst at the company.
After visiting for an in-person interview, I was sold. They had actual robots working in a warehouse, hundreds just roaming around. The problem they were working on was robotic warehouse automation, a collaboration between humans and robots doing different parts of the task. It was fascinating.
My first role was as a systems analyst, trying to understand how this complex system worked together. The role of a data scientist didn't really exist back then, but that's essentially what I was doing - analyzing the system and figuring out how all the pieces fit together.
The system was installed in a warehouse in central Pennsylvania, and my job was to figure out why the system was performing strangely or not as expected. I talked to all the humans interacting with the robots to uncover the reasons behind performance issues, measure the system's performance, and create tools to help manage it effectively.
It was one of the most interesting data sets I have ever worked on because of all the constraints in moving parts. There were low-level control systems doing things like path planning and resource allocation, and then there was the most complex thing: human behavior. Understanding how humans interacted with the robots and creating optimizations around that was the most interesting part of the problem.
At Kiva, we realized that data made products better. It's unclear whether Kiva would have been as successful if we couldn't have instrumented that complex machine and created tools that explained the complex dynamics to the warehouse operators.
Our customers had to figure out how to get tens of thousands of orders out over the next few days, and they had to optimize their inventory and various other aspects of their warehouse to make it happen.
On Building His First Data Product at Kiva
Actually, my boss highlighted the importance of my intro blog post when I was coming out with Glean later on. However, I had been doing the same exact thing my entire career. When I joined Kiva, my first project was analyzing all the system configurations for a complex system. There were literally hundreds and hundreds of dials that tuned the various algorithms that managed the entire system.
My job was to export all of that configuration data, review it, and analyze where things were out of line and how that was affecting performance in various aspects for our few customers. As soon as I started doing it, I realized someone would want to do it again. This is not the last time this is gonna be useful.
So, probably the second time I had to do this analysis, I decided to automate it and build a web app that everybody could log into and see an audit at any given point of all the system configurations. This was useful when we had our few customers initially, but it became even more useful when we had 20 or 30, or 40 customers and when we added the 41st site we were launching. It allowed us to quickly look up configurations for resource allocation seconds per station or like drive units named for the robots' drive unit configurations.
Instead of doing a thing repeatedly, I learned to figure out if there were useful products inside of the organization, and it's going to have way more utility if I can just automate that thing and build it into a little product that people can really dig into on their own. This early lesson led me to discover that a little bit of the data as a product thing assumes self-service and trust in the people around you.
Ad hoc requests are what make the work fun. They provide an opportunity to learn something new, talk to people, and discover requirements. However, many of those ad hoc requests should only be made once, and you should throw them away after they're done. Use that as an opportunity for building actual software. Always have an eye out for decisions that could be optimized or automated and tools that could be built.
On His Brief Stint at Amazon
The Amazon acquisition was a lot of fun, as Kiva and Amazon had a great cultural match. Both companies had very driven cultures. I remember the Amazon team descending on Boston and coming to the warehouse to figure out plans for integration. There were early whispers that they would just take the robots and write their own software, but it turned out that we had already solved the problem quite well at Kiva.
The first integrations into Amazon's fulfillment network were almost like customer engagements. We installed a system and integrated it with their warehouse management systems, which were unique to Amazon. They had written all their own warehouse software, and we integrated it with their technology stack. Those first integrations were intense, and we flew to Seattle almost weekly to set up these systems. The pace was breakneck, and the demands were high.
We had made about 5,000 or 6,000 robots when we got acquired. Over the next year and a half, the mandate was to do many times that number of robots. To this day, I don't have any particular insight into the number of robots produced, but there are hundreds of thousands of them.
I was involved in those first couple of projects right after the acquisition. Luckily, we didn't have to change the technology that much because we had done an effective job of solving it at other customer sites.
I don't know much about M&A, but this acquisition was incredibly successful. The same technology is now automating all of Amazon's warehouses. It's interesting to see the difference between the Kiva acquisition by Amazon, where it was really a drop-in-place technology that unlocked a ton of value, and the Roche acquisition of Flatiron later in my career. The Roche acquisition was really more hands-off. Roche is a conglomerate that owns many other pharmaceutical companies, so the way it's run is really with a lot of subsidiaries.
Amazon, on the other hand, kept our culture totally intact for the first one or two months. They collaborated with us and eventually swallowed our culture. They knew it was a perfect cultural match going into the acquisition. Our values aligned with their leadership principles, like taking ownership, and many of them were aligned from the outset. So they could acquire the whole company and incorporate it into their culture. A few years later, we were Amazon Robotics and no longer Kiva.
On Joining Flatiron Health as The First Data Hire
I was at a similar stage in my career or mindset to when I was finishing grad school, feeling excited about my work and thinking about what was next, but I had no idea what that would be. I considered starting a company and talked to many startups through connections to venture capitalists, including the Google Ventures team, when they made many early-stage investments around 2012-2013.
At that time, there were many mobile apps and check-in apps, which I found uninspiring. I even booked a one-way ticket to Bangkok, thinking I would take a break for many months. But then I met Nat and Zack, the founders of Flatiron, and their pitch for building an ecosystem around cancer data was incredibly motivating.
They wanted to partner with cancer centers to gather oncology data, which is incredibly valuable for understanding the different disease states in cancer and the drugs people are receiving. Clinical trials are limited by patient populations, so they wanted to use real-world data to advance cancer care.
When I joined, we had one cancer center partner, and my role as Integration Manager was to organize and integrate data from these centers to build data products. While the pitch was broader than that, we had immediate problems around data integration and ingestion organization.
On Building the Data Insights Team From Scratch
When I first joined Flatiron, I was an integration manager tasked with moving data around and figuring out how to get it from one place to another. However, the real challenge was figuring out how to ingest and integrate the data into our applications. This was in 2013, so we couldn't use AWS because it wasn't HIPAA-compliant. Instead, we had to copy databases using Azure, a complex and manual process. Since cancer centers weren't yet using our technology, we didn't have a clear vision of what data to grab or what endpoints we should focus on.
To address this, I got involved in the product development process, working with customers to determine their requirements and then going back to the source systems to see if it was possible to get the necessary data. This approach allowed us to establish a customer-centric data team that could close the gap between requirements and implementation. As a result, we were involved in launching almost all of the data products that Flatiron released.
On Building Data Products at Flatiron
In the early days of getting cancer centers excited about our services, we used basic business intelligence tools like population health discovery tools. These tools helped us focus on revenue cycle management, which is important to every cancer center, as they are also businesses. Cancer drugs are costly, so auditing the billing of these drugs was a powerful tool for these small and scrappy businesses. They had to be careful because if they messed up drug billing on something that costs 10, 15, or $20,000, they could go out of business if they billed incorrectly or didn't bill correctly to insurance companies.
Later on, we focused on clinical trials and our internal data products. We realized that we had to build data assets for every data product we created, which required returning to the source systems. We found that having an intermediate representation, like a data warehouse, made the process more efficient. Therefore, we spent time working on our central data warehouse and stopped working on user-facing products for a year to focus on the warehouse. This improved the quality of our internal data products, which unlocked future product development.
Perhaps this is why I left healthcare, but I think the biggest problems are actually just the alignment of incentives. This gets a little philosophical, but ultimately we treat medicine as a purely capitalist system. However, when your life is on the line, it's not like choosing between buying a cheeseburger at Wendy's or McDonald's—free markets don't rule. I'm willing to spend any amount of money, and the decision maker often differs from the person thinking about paying the bill. There are very misaligned incentives, so even when there are things that should be done for patients, like better care navigation, there isn't a model that can pay for it. Insurance companies may want to pay for it, but they need to see that it will have some effect or could make the cost of care more efficient. It's incredibly frustrating to see things you think should be done for patients but aren't because of misaligned incentives. Patients should have access to drugs and other treatments, but properly aligning incentives is probably the most challenging part of healthcare.
At that point, we didn't have Redshift, so the first version of the data warehouse was actually in SQL Server because most of our source data was in SQL Server. We built a lot of our own tools and hacked together a lot of our own tools. We used a bit of Tableau for visualization. Later, we used a product called caribelle, an open-source solution that later became Superset. After I left, we were using Looker. We used almost all the visualization tools available, and on the storage layer, we transitioned from SQL Server to Redshift and more Postgres-oriented solutions. One of the big innovations we built was an ETL tool called Blocks, which was very ergonomic for SQL-oriented folks. It was accessible to people who only knew SQL, so even Biostatisticians could use this tool to a certain degree. We built incredibly complex DAGs of data pipelines based on our internal data warehouse, which was a big unlock. If it were today, we would probably use something like DBT, but the tool we invented was a significant improvement at the time. We did a lot of building ourselves because many of the tools on the market were fairly immature.
On Hiring Data Talent
It was challenging for us at Flatiron because we had a systems-oriented approach to data. Therefore, we prioritized hiring for product skills rather than building a product peer case study. We presented candidates with a data set and a customer and asked them to create a case study of how they would serve that customer. This approach helped us gain insight into how to build for data at a small organization, where prioritizing what to build can be difficult. Unlike software development, data teams are often left to figure out how to prioritize things and discover use cases on their own.
We had a minimum bar for every hire, and each candidate needed to demonstrate use-case thinking, product orientation, and customer empathy. We also looked for a spike in technical areas such as statistics, coding, or machine learning.
Flatiron was a matrixed organization with different product initiatives that worked cross-functionally. Each initiative had a cross-functional team, including software engineers, biostatisticians, data insights engineers, product managers, and designers. The team with the most data insights folks was probably the central data warehouse team, which dealt with a lot of data organization.
Each product line justified our data folks' ROI, making it easy to justify our headcount. Data insights people helped move the product forward, so teams were always asking for them. The challenge was to develop norms and still feel like a team. We had a weekly meeting where we did cross-functional learning, which helped us see trends across the entire organization. This became another superpower of the data insights team, as we could connect the dots and find areas of collaboration and learning across the whole organization.
On Founding Glean (Now Hashboard)
I've always had an idea in the back of my head regarding data visualization and reporting tools. While many visualization tools are available, I've never found a go-to product that was the last in the category. There seemed to be something missing from my tool set, and I have encountered various obstacles while trying to empower people with data at Flatiron.
I remember specific instances where I gave data sets to operations personnel, only to have them create three swanky charts or three pie charts without sharing them with others. At that point, I realized the importance of coaching people on how to make data visualization accessible to others. Building a data visualization is like building a data product that needs to be consumed by someone else, so it's crucial to make it understandable to everyone.
Although there are common patterns for approaching data visualization, the user interfaces of these visualization tools are always the same and not very ergonomic. After leaving Flatiron and doing consulting, I began prototyping my own data visualization tool, Glean. I created a prototype, got excited about it, and showed it to a few people, which motivated me to start the company.
Glean is an ambitious product in a crowded and competitive space, so I knew it would take time to develop. I started on my own, writing React and JavaScript and thinking through the core concepts that needed to be included. After slowly hiring a team and getting some customers, we spent over a year building and testing the product with actual customers before making a broader announcement.
"glean" means picking up the morsels after harvesting a crop. In modern terms, it means to discover something from the data. That's how I see data visualization: starting with a heap of data, organizing it, and building on layers of meaning until people can dig through it and find their own insights.
On Data Visualization and Exploration
During my time at Flatiron, I coached people on how to analyze and visualize data and approach new datasets. Every tool accessible to non-data team members has a similar user experience: start by selecting a dataset and dragging columns around while experimenting with different visualization types. It's a combinatorial problem: With 20 or 30 columns and a few different chart parameters, you could conceive of hundreds of millions or billions of possible charts in the first five minutes. However, probably only 0.001% of those charts are worth starting with. Every time I coached someone on visualization and analytics, I started with a time series profiling of the data to get into the data.
Data tools are designed for a user experience, and Tableau, the most accessible tool, offers amazing visualization capabilities but is still hard to get started with. So, what Glean offers is automatic visualization and profiling at the outset. This isn't some esoteric approach; it's just a guided workflow that starts with exploratory data analysis, looking at trends over time and showing you a ton of those things right out of the box. The workflow defines some metrics declaratively and puts you in a very visual, very interactive explorer, allowing people who are somewhat familiar with data but not visualization experts to start clicking into data.
Analytics is a scale in and of itself that is separate from coding and technical skills, and creating a guided workflow that teaches people about good visualization is essential. Glean walks you through the process, whether or not you have coding skills or expertise in data visualization. The core value proposition of Glean is its strong defaults and automatic visualization, which makes it easier to share insights with your team.
It's hard to explain what Glean's automatic visualization looks like and how intuitive it is, so watching the demo video will help you understand it better.
On DataOps For BI
So far, a lot of what I've talked about is clicking around and finding insights, which is the fun part. I think Glean really makes it enjoyable. However, the challenge with dealing with a scaled organization is that sometimes those one-off little dashboards and analytics that you thought were temporary become incredibly important and mission-critical. Suddenly, everyone cares about this one dashboard, and a few things happen as a result.
Firstly, as your organization grows, the upstream dependencies from this dashboard are likely growing as well. Some data pipelines need to be managed before the dashboard actually gets materialized and before the chief revenue officer sees it. So, the quality of the data matters and can cause a lot of complexity upstream.
Secondly, you may want to iterate faster on this dashboard now that you have a new revenue line to incorporate. This can lead to additional development requirements for these dashboards that emerge. Unfortunately, change management in modern data tools is terrible.
At Flatiron, we managed this by trying to sync and coordinate changes to pipelines and downstream dashboards simultaneously. We tried to develop staging environments, but there weren't really good workflows for them. Sometimes we would just change things and wait for downstream things to break, and then we would fix them.
The idea behind DataOps inside of Glean is to give you this freewheeling experimentation. Everybody clicks around in the data. We have customers that just use Glean in that mode that are early-stage and don't need to worry about DataOps or anything.
However, sometimes dashboards get shared with customers, and they become production products that need to be checked in. Glean DataOps has a few different components. Every resource in Glean can be exported as configuration files and managed under configuration. We also have a build tool that allows you to build resources in Glean in continuous integration. We have a CLI that allows you to create these resources.
The most important feature is a feature we call previews, which allows you to see an alternate view of your entire analytics stack with a proposed set of changes. This accelerates teams because you can propose changes in a poll, see the entire environment in this sort of duplicated state, and show it to your chief revenue officer. They can play around with it for a week, then merge it and deploy it into production. This is one of the more sophisticated workflows for analytics development when those dashboards become production products, and you want to maintain their quality.
On Product Vision
Products are ultimately about people, and it is important to understand the different personas that interact with them. However, the challenge with data products, such as reporting and business intelligence, is that you have diverse stakeholders with different needs. These include executives who just want to check numbers, analytical individuals who want to dig deeper into the data, and platform engineers who maintain the systems.
Our approach to treating the platform as a product and data as a product within a single organization is focused on creating incredible tools for each of these personas. To achieve this, we have initiatives coming up to help different personas collaborate and have an amazing experience.
We launched a workbench that looks more like a SQL IDE, which is great for technical analytics engineers and other technical staff. We are now focused on improving our charting and visualization library, which intentionally only has a handful of chart types that are highly configurable and rearrangeable. This is to make it easier for users who are just trying to see some data instead of overwhelming them with 50 or 60 chart types. We are trying to make a core set of chart types, such as Cartesian charts and pivot tables, incredibly useful for organizations.
Our visualization library is a big area of focus for us over the next few months. We aim to make it more configurable by adding more complex tables, calculations, and trellising. This way, we can teach people how to do more complex visualizations in a safe way.
In addition, we are working on better collaboration within Glean, particularly for larger teams using the product. This means better inline documentation, commenting, and other features to enhance collaboration.
On Hiring
Hiring is all about culture, which is the best mechanism you have to influence. Finding the right people who appreciate your culture and values is crucial. Thus, interweaving your values and culture into your hiring process is incredibly critical.
When thinking about the culture to build at Glean, the data insights team at Flatiron provides a great model. They had a high energy level, were super innovative, and were willing to share and disagree with each other in open collaboration. Feeling safe enough to throw out ideas was essential to their success.
At Glean, we want to build a similar culture. We need to find people who are excited about taking ownership, being in the driver's seat, and having a collaborative spirit. We want a diverse and eclectic team that is focused on technical excellence, innovation, collaboration, and ownership. Candidates see when you take these things seriously and when you have an organized set of requirements.
Hiring is a two-way street. We need to make a value proposition for candidates, showing them how their careers can unfold inside our organization. We need to explain our product and values and ensure candidates are excited to take the next step with Glean.
At Flatiron, we worked with a recruiter to source diverse candidates. We also held and engaged in events to find candidates. We focused on reaching out to diverse folks at the top of our hiring funnel and making sure there was good representation throughout our entire hiring process.
On Finding Design Partners
It has been particularly difficult in the business intelligence sector, even though many stable options are available. For me, networking and pitching ideas to people were key. This allowed me to identify pain points and find receptive areas in the market. Sales in a competitive market can be challenging, and it takes persistence, conviction, and repetition to find early believers and adopters.
I am naturally stubborn, which helped me to develop an immense amount of conviction around my idea over 10 years. However, this is not a repeatable set of advice. Instead, it is important to have conviction in your idea backed by market evidence. Tenacity can come from various sources, such as upbringing or personal qualities.
On Fundraising
It's probably a challenging fundraising environment right now.
When it comes to investors, I see them like employees. They won't have as much impact as employees. Money is obviously valuable, but finding investors is like finding people who want to join your journey. It's not that different from finding and recruiting your first customers or employees. You have to carve out a path for them too. You're helping your investors do something too. They have motivations and are motivated as well. So, there isn't a silver bullet for finding investment.
Most of it is just preparation. Don't spend all your time looking for investment. Instead, spend your time preparing, figuring out the market, having an amazing story, building proof points of that story, and having that in your pocket when talking to investors.
Show Notes
(02:06) Carlos shared formative experiences of his upbringing tinkering with robots and websites.
(04:03) Carlos reflected on his education, studying Mechanical and Aerospace Engineering at Cornell University.
(05:34) Carlos discussed the technical details of his research on machine learning applications in robotics and art.
(10:11) Carlos explained his work as a robotic system analyst at Kiva Systems.
(15:41) Carlos discussed building his first data product at Kiva.
(20:24) Carlos recalled his stint working on warehouse-automating distributed robots at Amazon Robotics (after the Kiva acquisition).
(24:31) Carlos revealed his decision in 2013 to join an early-stage healthcare startup called Flatiron Health as the first data hire.
(28:43) Carlos shared his experience building Flatiron's Data Insights team from scratch.
(31:51) Carlos reviewed different data products built and deployed at Flatiron Health.
(38:41) Carlos shared the key learnings from hiring for his data team at Flatiron.
(44:08) Carlos shared the founding story of Glean (now Hashboard), which is building a new way to make data exploration and visualization accessible to everyone.
(50:52) Carlos explained the pain points in data visualization/exploration and the product features of Glean that address them.
(55:03) Carlos dissected Glean DataOps, which brings modern developer workflow to the business intelligence layer and prevents broken dashboards.
(59:28) Carlos outlined the long-term product vision for Glean.
(01:03:11) Carlos shared valuable hiring lessons to attract the right people who are excited about Glean's mission.
(01:07:15) Carlos discussed his team's challenges in finding the early design partners.
(01:10:13) Carlos shared fundraising advice to founders who are seeking the right investors for their startups.
(01:11:57) Closing segment.
Carlos' Contact Info
Hashboard's Resources
Mentioned Content
Blog Posts
How the Data Insights team helps Flatiron build useful data products (May 2018)
The biggest mistake making your first data hire: not interviewing for product (July 2020)
How to interview your first data hire (Aug 2020)
My hack for getting started with data as a product (May 2021)
Introducing Glean (March 2022)
Your dashboard is probably broken (April 2022)
People
Book
Notes
My conversation with Carlos was recorded back in June 2022. The Hashboard team has had some announcements in 2023 that I recommend looking at:
About the show
Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.
Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.
Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:
If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.