Datacast Episode 95: Open-Source DataOps, Building In Public, and Remote Work Culture with Douwe Maan
The 95th episode of Datacast is my conversation with Douwe Maan — the founder and CEO of Meltano, an open-source DataOps platform.
Our wide-ranging conversation touches on his early interest in programming; his engineering career at GitLab; his current journey with Meltano building an open-source DataOps platform infrastructure for the Modern Data Stack; his thought leadership on building a company in public, pivoting an open-source project, grooming a remote work culture; and much more.
Please enjoy my conversation with Douwe!
Listen to the show on (1) Spotify, (2) Google Podcasts, (3) Apple Podcasts, (4) iHeartRadio, (5) RadioPublic, and (6) TuneIn
Key Takeaways
Here are the highlights from my conversation with Douwe:
On Getting Into Programming
When I was growing up, my father was one of the first people in the neighborhood and his family to have a computer. So from a very young age, I thought of computers as not just something to use but something to do new things with. I realized when I was in primary school, together with a friend, that it did not take much to start building websites. All the materials were available online for free, including the PHP programming language — making it uniquely accessible to start channeling my creativity in this way.
While studying Computer Science at Utrecht University, I have already become familiar with basic programming and databases. My favorite class was on cryptography because it went into the math and the theory of computer science, which I had not been exposed to as much as the practical side of things up to that point. I was able to put into academic terms like Big(O) notation and figure out the efficiency and security of my software. Although out of what I used daily as a software engineer, I definitely learned more over 10+ years doing it myself with the Internet under my fingertips than in the classroom.
On Joining GitLab
I switched from doing PHP web development to building iOS and Mac apps in high school. After doing that for a few years, I co-founded a new company that built software for bed and breakfasts. Around the same time, I went to conferences around Europe and the Netherlands that had to do with Ruby on Rails and web development. At one of these, I ran into a couple of people working on a tiny Dutch company at the time called GitLab — which had been built around this open-source project founded in Ukraine a few years prior. The founder of GitLab is Sid Sijbrandij. His parents owned a bed and breakfast in the north of the Netherlands and had been using my product for that service.
Retroactively, I can see that must have been my resume essentially. Sid asked me to join GitLab as employee number 10, just when GitLab was going through the Y Combinator accelerator program and started raising the first funding. I was able to see that entire journey from a front seat from the very early days.
On Being An Engineering Manager
GitLab’s product is a platform for software developers and software development teams to help them be more effective at their jobs. Working on software development tooling as a software developer is uniquely attractive because it is a matter of scratching your own itch.
One thing we appreciate in exceptional engineers in our teams is their ability to take a relatively rough problem description, coming from either another programmer in the open-source community or a customer, and work with that user on the issue tracker.
Since this was all open-source software, our users had the same access to the discussion around how we would solve certain problems that our team had. These engineers could put on the product manager hat in order to build a great solution.
The unique thing about GitLab and also today with Meltano is that engineers know well if they are solving a problem or not because they themselves are often the people experiencing the same problem. So they will know if they would use a certain solution or not.
Being able to go from a vague problem description to a solid solution, not just by blindly listening to a product manager but by putting your own experience into the situation, always leads to great results.
I joined GitLab as employee number 10. Originally as a software engineer, I was quickly tasked with hiring engineers. It came from the fact that GitLab at the time only had 10 people working on the product, but the open-source community of contributors counted in the hundreds. We had a steady flow of engineers with whom we had already had the working experience with and who really cared about our mission. Part of the exercise I took on as a hiring manager was to start identifying those people in the community who would turn out to be great teammates. It is unique to have a chance to work with your new employees on real problems with real stakes in a way that you cannot do if you only hire based on people’s resumes or how they perform in an interview.
On The Evolution of His Career at GitLab
I started working at GitLab part-time and then relatively quickly became a hiring manager for engineers — hiring the first 14 engineers. At the time, my role had not changed much since I was still doing hands-on programming most days. But then, soon, GitLab found itself in a place where we needed to start having some kind of management hierarchy and responsibility around the PeopleOps experience. About a year into my time at GitLab, I officially became the development lead. I was still in charge of hiring and managing people who had already been hired, but a lot of my day was still very hands-on — coding myself and doing code reviews for the work of my team and open-source contributors.
As the company grew, this development team was split up into backend and frontend, leaving me being the backend lead and working closely with the frontend team. Over time, we at GitLab realized that we were building not just a version control solution and an issue tracker but a platform for the entire DevOps lifecycle. We ended up with different teams for each step in the lifecycle: Creating (the actual code), Planning (issue tracking), and Verification & Release (continuous integration and deployment).
Over this journey, I had a choice to pick where I wanted to stay as my team and other teams were splitting up. I always felt most passionate about code review and the process of helping engineers iteratively improve the things they have built through feedback for myself and my peers. As a result, the functionality I wanted to be involved with building had to do with code review and version control in GitLab. At the end of my stint in engineering management at GitLab, I was responsible for the backend side of Create and the source code area specifically. If you have ever viewed a repository, read commits, or reviewed merge requests, you would have looked at code that my team wrote and that I reviewed.
On The Origin of Meltano
Meltano was started inside GitLab in 2018. At the time, I was not yet involved with the project. As GitLab’s data team was growing and figuring out how to process all of the data from various places to get insights, they realized that the tooling they had been building did not seem to match the expectations that GitLabs had (around tools being open-source, extensible, and flexible). We wanted tools that allowed the data team to (1) use version control so that you can figure out the after effect why a change was made, and (2) rollback code reviews so that you can give your input on changes before they go live and potentially affect the production dashboard.
In general, the tools we were using for data seemed to be behind the things we have gotten used to on the software development side. GitLab, being an open-source developer tooling company, realized that there was an opportunity here to start building open-source data tooling that borrowed some of the best practices in DevOps such as version control and code review to enable more effective collaboration, as well as end-to-end testing to diagnose changes before anything gets pushed into production.
Inside GitLab, we started to build an end-to-end platform for the data lifecycle. It would use as many existing open-source technologies as possible and bring them together in one platform so that the entire organization can have a single product that covers everything from data integration to data visualization. For instance, you can quickly detect if a change to the data integration configuration would negatively affect a dashboard down the lifecycle. This project was started first, specifically with the data team being the primary user and customer. This continued for a while until the needs of GitLab’s data team started growing faster than the Meltano team could keep up with.
Even though Meltano started as an internal project, from day one, it had been built in the broader open-source community. We realized that we needed feedback from data teams everywhere to build something that would help people all over. There was this big opportunity to build a new product by data people for data people that would turn out to be the most useful tool in their arsenal (just like GitLab had become for many software developers).
On Building Meltano Full-Time
In 2019, Meltano had been around for a bit over a year inside GitLab. Of course, everyone knew about it, but only a team of five was working on it (and the other 1,300 people or so were working on their own things). I was working on GitLab’s Source Code backend engineering team at the time, but I started feeling lackluster. After 4+ years at GitLab, I have seen their transformation from a tiny 10-person startup into a 1400-person organization with hundreds of millions in ARR. Many things in place to run the business responsibly were not aligned with the things that had attracted me to the early startup named GitLab.
I started to think about finding a new opportunity, but there were many things about GitLab that I was sad to leave or that I was not sure I would find in quite the same way elsewhere: part of it being open-source and developer-centric, but also the way GitLab did remote work and took transparency as a core value.
GitLab does not just talk to users about issue trackers and open-source products. Every aspect of how GitLab as a business is run is publicly available on the company’s website and handbook. Anyone in the world can go there and learn about how GitLab works. This document is updated daily and reflects well how things get done at GitLab. The transparency was appreciated by everyone at the company because, at any point, you could find out how things were going in other departments or how the company was doing at a high level.
There was also the remote work aspect, which GitLab had been doing at large scales since 2015 — partly because of its open-source root where there were already hundreds of engineers from countries all over the world collaborating on this product. From day one, GitLab never had an office. By the time I left officially in early 2021, there were 1,400 employees in 68 different countries. That freedom to travel and live in different parts of the world to connect with the global community was special for me.
When I thought I might need to look outside of GitLab, there was this great opportunity to join the Meltano team as they were looking for an engineering leader to join this 5-person team. At the time, Danielle Morrill was the general manager. For engineers, they were looking for someone to bring the intimate knowledge of how GitLab builds open-source software and developer tools in general to this new project in the data space. For me, this opportunity was compelling — building open-source tools again, but now focusing on a slightly different audience of data people rather than the software developers I have worked with previously.
Furthermore, Meltano’s mission from day one had been about bringing software development best practices to an industry that would benefit from them just as much, but was a couple of years behind in the DataOps wave to pick up principles like version control and CI/CD. That, combined with coming to a culture that came out of GitLab, was really appealing.
Now, as Meltano has spun out of GitLab, we have the opportunity to shape our own culture, learning from GitLab in many ways but trying to make some of our own mistakes instead of the same ones. The goal is to build a second company that has a similar impact on data tooling, just as GitLab had on developer tooling.
On Meltano’s Pivot
From day one, Meltano was built around the conviction that we had:
Bringing software development best practices to data tooling would be valuable.
There was much value in an end-to-end platform that encompasses the entire data lifecycle from data to dashboard — all the way from getting data out of the source systems to the visualization your analysts use to produce insights.
We had been working on this for a while and using open-source technologies as much as possible. We also realized that there were parts that we had to build ourselves in this end-to-end platform. We started building data integration functionality on top of existing Singer standards. For transformation, we use dbt. Further down, the orchestration was handled by Airflow. For the visualization and modeling, we built our own technology since we did not find any existing open-source technology that met our needs.
We were building this platform because we were convinced that if users could use a single tool to quickly go from data to dashboard, this would be immensely valuable to data teams from large to small size. We found that, because this story hinged on adopting all of Meltano in one go, one would need to replace whatever he/she had already.
It was interesting for people who were still setting up their data stack for the first time (i.e., people starting a new startup or companies with tiny data teams that were looking for a tool to help them run quickly). We were starting to get some users who were startup founders, including less technical people who might not be confident setting up a whole data stack from scratch, and they were giving us useful feedback.
But about six months into my time on the Meltano team, between September 2019 and March 2020, we reached the conclusion that we were building something of value, but it was not attracting the types of users we needed to keep building and growing this product. It was less interesting for existing data teams because they would need to rip out everything they already had and replace it with Meltano, which was a really hard sell when the value was not immediately clear. Furthermore, Meltano, because it had been developed just by a team of four engineers and a couple of open-source contributors, was still relatively immature and not ready to take over all of a company’s entire data needs.
Part of the conclusion we came to is that it is far easier to convince people to contribute and provide feedback to a more narrow product they can adopt in their current data stack (rather than something that requires them to replace the entire thing).
At the same time, from the Gitlab side, there were six full-time people on the team who had been working for a bit more than a year and a half. But the actual number in terms of usage and contributions was not growing at the pace to warrant that level of investment from GitLab. The decision was made in early 2020 to basically give Meltano the best chance at finding its place over time by making the most of the existing allocated budget, which meant reducing the headcount on the team from 6 to 1 and extending the runway from 1 month to 6 months. The decision was made for me to be on the product by myself with the challenge of figuring out how to take what we have built so far and turn it around to get the usage/contribution number up.
I started talking to Meltano users in our Slack community and learning from them what part of Meltano had originally gotten them attracted. I came across the Singer community, which is formed around the Singer technology — an open-source standard for data connectors that help you do data extraction and loading to get data out of various systems and load them into a data warehouse for further processing and analytics. Singer, as a standard, had been around a couple of years and promising. There was a library of connectors that ran into a couple of hundreds.
From talking to them, I realized the opportunity to build great tools around Singer to run these data integration pipelines — which managed their configuration / incremental replication states and made them more accessible than the standalone connectors would have been. Based on that feedback, I realized the Singer community was underserved regarding the tooling being built around it. Meltano had come across that opportunity but had not been capitalizing on it yet.
After those conversations, I changed the positioning on the website (not even the product itself) to focus on that Singer integration. Singer users can use Meltano to run open-source pipelines on their own infrastructure with the advantages of software development best practices — treating pipelines as code and allowing these pipelines to be tested automatically so that software developers can bring these advantages to the data integration game.
On Running Meltano By Himself
When I was told that the Meltano team that existed at the time could not live on and I was given the opportunity to take it over, it sounded a bit scary. I was also sad to see the team’s time at Meltano come to an end, so I had to figure things out myself instead of having people to talk to every day to validate my thoughts and provide useful counterbalance / critical questions. But there was never a part of me that thought about doing anything else and not accepting that opportunity because I thought there was still a lot of value in Meltano that I did not want to see it go at all. I realized that by combining my engineering and leadership experience and my capability to get people enthusiastic about things, it made the most sense for me to try it.
In some sense, because the product had been around for so long and we had not been able to find significant success, the bar I needed to reach also felt relatively low. Because we had been plateauing for a while, any measure of growth would be better than nothing. It was just up to me to figure out the little things to focus on week-by-week, month-by-month, and the dials to turn to slowly get the traction to keep rising and the contributions to come in. I decided to take it one step at a time and try to identify at any moment what is the highest priority thing I can do.
In some sense, it is easier to do that when you are a single person who has the full context of everything going on with the project — all the way from the questions being asked in Slack to the discussions taking place on Hacker News or Stack Overflow to knowing the actual code base and understanding what is possible. That led me to a place where I could have the complete picture and decide on a day-to-day basis what needed to happen.
On Open-Source ELT Pipelines
Even though great open-source tools exist for various stages of the data lifecycle (Airflow, Dagster, Prefect for workflow orchestration; dbt for transformation; Great Expectations for data testing and validation; Superset and Lightdash for visualization), the situation on the data integration side was not as bright. The big data integration solutions people used today (Fivetran, Stitch, Matallion) all had a number of issues that have left people reaching for open-source solutions and not finding them.
Firstly, if you go with one of the SaaS data integration platforms, you are limited by the data sources and destinations they support and decide they are worth being maintained in-house.
This means you will probably find that there are about 150+ sites that you can pull data from, and most of them will be good. But if you need something beyond that, you are left to fend for yourself. In many cases, you never pull data from a section of the tools you are using because they are not supported by the tools you chose for data integration.
Open-source solves this problem because every connector can be an individual project that those who need it can work on together. Often, what happens is that one person or one team builds it and open-sources it. From that point on, the maintenance burden is shared. This means the long list of data sources that can be supported is not limited by what one company can afford to maintain in-house.
Singer already has 300+ connectors today, which is about double what you will find at a place like Fivetran. Meltano is especially useful right now for companies that are either in niche industries (where the big vendor solutions might not support the tools) or in regions other than the US (where the local tech provider is not supported by these US-centric tools either). There are many connectors for these tools, but the only way to step into those right now is via Singer.
If you are using one of these SaaS data integration tools and have a problem, your only option is to reach out to their support and hope that they get back in time to prioritize your request. Similarly, if you want the connector to do a bit more than it does today (if there is a new data entity you would like to import or if there is a behavior you would like to tweak), your only option is to ask for support again. Instead of having full control over your data stack and pipelines, you have outsourced a really significant part.
Secondly, Meltano is built from day one around principles where pipelines are code that you can check into a code repository, and your team can collaborate with functionality like version control, code review, and CI/CD. If you are using a web interface, it does not fit this world of thinking about building data pipelines as a type of programming.
Thirdly, open-source software can be self-managed. You can host it on your own infrastructure, whether the actual machines are running in the data center or a cloud vendor. If you are dealing with security, compliance, or privacy restrictions and have to abide by HIPPA, using a proprietary platform can be impossible or extremely expensive.
We strongly believe the future of data integration will be built around open-source software for the reasons mentioned above. Any data tool that will still be relevant ten years from now will have to find a way to work well with these DevOps and DataOps best practices. Meltano makes that possible today by providing a great integration around Singer, dbt, and Airflow and making them come together as your private ELT platform, which allows you to easily build new connectors, improve existing connectors, and deploy your pipelines wherever you see fit.
On Meltano’s ICPs
A big difference between going to Fivetran or Matilion and signing up to set up your pipeline and the Meltano approach is that: you are expected to be a bit of a developer, at least to the extent that you are comfortable with CLI to make changes in your code editor and figure out yourself how to get this running on a machine somewhere.
If you are a data team that already has some software engineering backgrounds and has been disappointed by the state of DataOps tools, Meltano will feel like home. It will be easy to set it up, run it on your local machine, bring this into a Git repo to share with your team, and deploy it wherever you like.
If you are a developer, this will feel easier than going into a web interface and clicking buttons.
On the one hand, we are finding a lot of data teams that are already mature enough to have the software engineering expertise to pick their own tools, contribute to them, and deploy them themselves. On the other hand, we are seeing usage from software development teams, individual engineers, or tiny startups that have been tasked with data tasks, even if they might have never been data engineers before.
We plan to start a hosted version of Meltano, our own SaaS, so people can use our interface to configure things. Or else, they can configure things locally, then push them up to our service to manage the actual hosting in production.
On Data Operating System
Meltano, at its root, is a common interface you install (using pip install with Python). Then it works in the context of a directory on your file system, which has the Meltano YAML project file. This is similar to a lot of web development and other software development products. This Meltano project directory defines all the different components of your data stack — how they are configured, how they are tied together, and what kind of pipelines you want to run based on the components you have brought in.
At a very low level, Meltano is what we called the Data Operating System (or the package manager for data tools) that allows you to pick these components and bring them into one place. Then Meltano is responsible for installing and automatically configuring components like Singer for extraction and loading, dbt for transformation, and Airflow for orchestration. Meltano is an ELT product by virtue of bringing in specific components into the Meltano data stack that unlocks data integration functionality.
We are working on adding support for more plug-in types so you can bring every component of your entire data stack into Meltano and make it a stable foundation of your data stack.
On Meltano Architecture
The first thing we realized is that there needed to be better tools to run, configure, and deploy Singer connectors together in a pipeline. That is what Meltano itself became, and we focused a lot on making it incredibly easy to bring in different Singer connectors and run those.
The second thing we realized that was holding back the Singer ecosystem and data integration, in general, is the difficulty of creating new connectors. Therefore, we built Meltano SDK, which speaks to the Singer protocol that provides an improved framework that you can build new Python packages on top, making it incredibly easy to build connectors for new data sources and destinations.
The third thing we realized is that it was challenging to discover all of the different Singer connectors that had been built over the last couple of years because they were spread out between different GitHub namespaces, PyPi packages, and communities. A single authoritative list of every Singer-compatible (and therefore Meltano-compatible) connector did not exist. MeltanoHub is our attempt to build out that list.
Fourthly, Meltano Labs is an initiative we set up to answer the question: If you create a connector for a data source today, do you have to maintain it forever? How do we, as a community, make sure that all of the open-source connectors stay maintained? Getting contributions from users who want to improve them is not difficult. The more difficult thing is having people review those contributions and put in the time to ensure they meet the quality bar to be included in the official connector. Meltano Labs provides the space for community members to bring in their connectors and crowdsource maintenance by tapping into the entire Meltano and Singer communities.
Through a combination of Meltano itself, the SDK, the Hub, and the Labs initiative, we have been addressing all the top issues holding Singer back. Now, we found a place where more connectors are being made than ever, and existing connectors are receiving many more updates than before.
All of these tie into Meltano’s product strategy. Our longer-term goal is to bring DataOps to the entire data lifecycle. This starts with bringing it to data integration and showing people the advantages of DataOps this way. Leveling up the state of open-source data integration is key to our strategy of leveling up the entirety of the open-source data tooling space.
On Meltano’s Product Roadmap
The Singer ecosystem has 300+ individual projects that form the connectors for various sources and destinations. Because you have so many users and maintainers, you must have a system to manage it. At Meltano, we have a team in place now that reviews those contributions. This ties into the product roadmap because from our own perspective, we have to have some idea of working towards. Our vision from Meltano is to be the data operating system that becomes the foundation of every team’s ideal data stack by bringing together different open-source components for different stages of the lifecycle. Meltano makes them better than the sum of their parts.
When it comes to our own product roadmap, we have our own ideas of specific functionality we would like to build at certain times, but of course, our open-source contributors can choose their own priorities of things they want to improve in Meltano to make it a slightly better tool for themselves. So it is a matter of being clear about where we are going because it describes not just the narrow roadmap we will build ourselves, but also the broader roadmap of what is potentially in scope and where we are going.
When it comes to contributors contributing new functionality, it usually starts with an issue on the issue tracker that we have already created at some point, which is a strong indication that we would actually want that functionality or something that the contributors file an issue for. Then, it is a matter of discussion between the head of product on our side and the contributor to figure out that this is something that makes sense in the product.
Of course, we love it when people spend time on things we think would be valuable that we cannot afford to make a high priority right now because there are other things we need to build. That is exactly why a tool built in collaboration with its users will end up being better than a tool built in isolation by a specific product team. You can tap into both the perspectives and opinions and the actual programming time and resources of all your users who are similarly motivated to build a great tool.
Internally, we put most of our development time into the features that we know will incrementally get us to bring DataOps to ELT. Step by step, we will bring more stages of the lifecycle into that. Regarding what the community can contribute, we are open to everything they can convince us, which has a place in the data operating system. We are always willing to have those conversations.
On Spinning Meltano out of GitLab
In 2020, Meltano’s usage, contributions, and general traction started increasing quickly. Just a few weeks after I had changed the positioning to focus on data integration, I started getting the first emails from VC firms who were willing to start talking with us and figure out whether one day Meltano might become an interesting investment opportunity for them. By the end of 2020, I had a number of these emails in my inbox.
The number for Meltano’s usage finally got to a place where GitLab started seeing that it was worth investing more to increase the pace of growth and allow Meltano to reach its full potential. We started by hiring two people onto the team:
The first one came from GitLab’s data team and had been involved with Meltano pretty much from the start as an internal user and advocate. His name is Taylor Murphy, and he is our head of product and data today.
The second person is AJ Steers, one of Meltano’s early contributors who worked at Slalom Consulting. He is now our head of engineering.
Through hiring them and looking for other roles, we realized that GitLab, as an organization, was very much optimized to build a mature product. All 1,400 people working there are aligned around this one specific product. That did not fit Meltano anymore.
The other part is that the needs of a 1,400-person company and a tiny 3-person startup are different. The expectations both from the companies and the employees are different too.
If you join an early-stage startup, your compensation picture looks extremely different from when you join a public company (as GitLab is now post-IPO).
As a public company, your expectations of your employees (to the extent to which they stay within the narrow job description) differ from our needs. Hiring GitLab people did not bring us the kind of folks we needed as a super early-stage startup trying to save the world.
We realized that by disconnecting Meltano from the massive GitLab machine, we would come to a place where we could act more appropriately for the stage we were at and optimize for that rapid growth and impact we wanted to have.
On Engaging Open-Source Contributors
One thing that comes from GitLab is the realization that building great open-source software does not end at opening up your codebase and accepting contributions. If you want people to be involved in the process of building the product, it is critical that you open up as much of the company (the internal thinking, the product roadmap, and every feature discussion) as you can to give people the opportunity to weigh in on existing conversations. This contrasts with external contributions being seen as something additional and separate.
Meltano’s entire issue tracker for the product is open. Most issue trackers around how we run the company are open as well. These trackers are in Slack, where the vast majority of our channels are public, not private to the company. Then we have office hours where our actual engineering team ends up discussing the top priority feature in their mind that we plan to build over the coming weeks. The entire community can watch and participate in these conversations to weigh their perspectives. People in our community feel like they are a part of the team to a much greater extent than they would if we were a bit more close off.
For us, open-source is not about getting a lot of free users and then converting some of them to paid users. Even though, of course, that is something we need to do at some point as we commercialize. But open-source software is built on the strong belief that the best tools are built in collaboration with their users.
On Hiring
To attract people who are aligned with our company values, one useful thing is an extension of what I described earlier: involving your community users in every aspect of the company, instead of just through narrow feedback channels, makes a big difference. People can self-select themselves for companies they want to work at by seeing how the team interacts with each other, with the users from the community, with other vendors, and with whomever on the issue tracker on Slack or the streaming calls. People know what they are getting in, and we can test for that through the hiring process to ensure they actively care about all these things (instead of blindly applying for a company and figuring out afterward whether it is a culture fit or not).
When possible, we prefer to hire people who have already shown that they are actively interested in what we are doing and care enough to get involved. You can get a good feel for each other in terms of how you work and what you care about by spending weeks and months talking a few times a week with these members. Having been able to work with a number of these people before they joined Meltano (because they were community members and contributors) has made a big difference.
We value work-life balance a lot in order to build a sustainable business that will be around for decades to come. We are building this framework from the ground up to allow ourselves to get there without making shortsighted decisions — either regarding how we treat our employees, the community, or the industry.
On Remote Work
If you want to do remote work well, you need to be intentional about it: You cannot just assume that processes, practices, and habits that are appropriate for an office atmosphere will transfer and translate directly to remote work. You need to have everyone be aware that new problems will come up that need to be addressed.
You do not have to figure out remote work by yourself: A great thing about GitLab is that its handbook for how the company runs has been public and readable to the whole world from day one. There are hundreds, if not thousands, of pages on the topic of remote work alone with different things GitLab has learned via trial and error to get the best out of a remote team. A lot of Meltano’s processes are either taken directly or inspired by GitLab’s way of approaching remote work. Transparency and communication have been deep in my DNA.
Show Notes
(01:46) Douwe went over formative experiences catching the programming virus at the age of 9, combining high school with freelance web development, and studying Computer Science at Utrecht University in college.
(03:55) Douwe shared the story behind founding a startup called Stinngo, which led him to join GitLab in 2015 as employee number 10.
(05:29) Douwe provided insights on attributes of exceptional engineering talent, given his time hiring developers and eventually becoming GitLab’s first Development Lead.
(08:28) Douwe unpacked the evolution of his engineering career at GitLab.
(11:11) Douwe discussed the motivation behind the creation of the Meltano project in August 2018 to help GitLab’s internal data team address the gaps that prevent them from understanding the effectiveness of business operations.
(14:38) Douwe reflected on his decision in 2019 to leave GitLab’s engineering organization and join the then 5-people Meltano team full-time.
(20:24) Douwe shared the details about Meltano’s product development journey from its Version 1 to its pivot.
(26:18) Douwe reflected on the mental aspect of being the sole person whom Meltano depended on for a while.
(29:20) Douwe explained the positioning of Meltano as an open-source self-hosted platform for running data integration and transformation pipelines.
(34:54) Douwe shared details of Meltano’s ideal customer profiles.
(37:45) Douwe provided a quick tour of the Meltano project, which represents the single source of truth regarding one’s ELT pipelines: how data should be integrated and transformed, how the pipelines should be orchestrated, and how the various plugins that make up the pipelines should be configured.
(40:39) Douwe unpacked different components of Meltano’s product strategy, including Meltano SDK, Meltano Hub, and Meltano Labs.
(45:05) Douwe discussed prioritizing Meltano’s product roadmap in order to bring DataOps functionality to every step of the entire data lifecycle.
(48:53) Douwe shared the story behind spinning Meltano out of GitLab in June 2021 and raising a $4.2M Seed funding round led by GV to bring the benefits of open source data integration and DataOps to a wider audience.
(52:19) Douwe provided his thoughts behind open-source contributors in a way that can generate valuable product feedback for Meltano.
(55:43) Douwe shared valuable hiring lessons to attract the right people who align with Meltano’s values.
(59:04) Douwe shared advice to startup CEOs who are experimenting with the remote work culture in our “new-normal” virtual working environments.
(01:04:10) Douwe unpacked Meltano’s mission and vision as outlined in this blog post.
(01:06:40) Closing segment.
Douwe’s Contact Info
Meltano’s Resources
Mentioned Content
Articles
Hey, data teams — We’re working on a tool just for you (Aug 2018)
To-do zero, inbox zero, calendar zero: I think that means I’m done (Sep 2019)
Meltano graduates to Version 1.0 (Oct 2019)
Revisiting the Meltano strategy: a return to our roots (May 2020)
Why we are building an open-source platform for ELT pipelines (May 2020)
Meltano spins out of GitLab, raises seed funding to bring data integration into the DataOps era (June 2021)
Meltano: The strategic foundation of the ideal data stack (Oct 2021)
Introducing your DataOps platform infrastructure: Our strategy for the future of data (Nov 2021)
Our next step for building the infrastructure for your Modern Data Stack (Dec 2021)
People
Maxime Beauchemin (Founder and CEO of Preset, Creator of Apache Airflow and Apache Superset, Angel Investor in Meltano)
Benn Stancil (Chief Analytics Officer at Mode Analytics, Well-Known Substack Writer)
The entire team at dbt Labs
Notes
My conversation with Douwe was recorded back in November 2021. Since then, many things have happened at Meltano. I’d recommend:
Checking out their updated company values
Reading Douwe’s article about the DataOps Operating System on The New Stack
Examining Douwe’s blog post about moving Meltano to GitHub
Looking over the announcement of Meltano 2.0 and the additional seed funding
About the show
Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.
Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.
Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:
If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.