James Le

View Original

Datacast Episode 86: Risk Management, Open-Source Governance, and Negative Engineering with Jeremiah Lowin

The 86th episode of Datacast is my conversation with Jeremiah Lowin— the Founder & CEO of Prefect, a dataflow automation company.

Our wide-ranging conversation touches on his Economics and Statistics education at Harvard; his training in risk management working at a big hedge fund; his foray into machine learning and building frameworks; his contribution to Apache Airflow and insights about open-source governance; his current journey with Prefect eliminating the negative engineering problems; lessons learned from engaging an open-source community, innovating a novel business model, finding early enterprise customers, building a high-performance team, choosing the right investors; and much more.

Please enjoy my conversation with Jeremiah!

See this content in the original post

Listen to the show on (1) Spotify, (2) Apple Podcasts, (3) Google Podcasts, (4) TuneIn, (5) RadioPublic, and (6) Stitcher.

Key Takeaways

Here are the highlights from my conversation with Jeremiah:

On Studying at Harvard

At Harvard, I studied Economics because, frankly, I didn’t know what to study, and all my roommates seemed to study Economics. Then I discovered Econometrics and became fascinated by this tool that could take a bunch of numbers and extract insights/tell stories. I pursued Statistics for the same reason, as I was fascinated by the ability to discover meaning from noises.

Most of my academic career was characterized by an obsession with how two statistical distributions interact. In my case, it was financial assets in the stock market. I was fascinated by the declaration that the stock market is an excellent example of a normal distribution in most of my undergraduate classes. But then, I read a book called “The Misbehavior of Markets” by Mandelbrot and learned another declaration saying that stock markets are absolutely not normally distributed. Why are these two statements compatible? How can I parse this out? That launched my academic work and the fascination that has stayed with me for many years now.

Source: https://www.investopedia.com/terms/c/copula.asp

My thesis focuses on a class of models called the Copula model, which you can consider a glorified correlation function. When data scientists talk about correlation, these are correlations of zero. They are independent correlations of one. They are perfectly dependent, in which one variable characterizes the other. That’s a simple description of the way that two things can interact. For example, in the context of the stock market, two stocks might not be particularly correlated. But as you may have heard, when stocks go up, they take the escalator; when stocks go down, they take the elevator. The correlations tend towards one when markets are crashing, which describes behavior that’s not compatible with simple linear correlations. I can’t have two things that are mostly uncorrelated and then extremely correlated when they move sharply downward together. These Copula functions are a way of characterizing and describing that dependence structure.

My thesis was on building empirical Copula from data that you didn’t necessarily know in advance what model you wanted to fit them. This turned out to be very relevant in my career when I joined a large hedge fund called King Street Capital, which had an extensive derivatives portfolio. Post the financial crisis, the pricing of these credit derivatives was primarily based on Copula modeling. The reason is that you have these instruments that are very sensitive to the default of underlying credits within them. Much of the time, when one company is going bankrupt, other companies are much more likely to go bankrupt because it implies something about the world. But when companies are not going bankrupt, they tend to be independent. In order to price these structures, this knowledge became very helpful and was a funny application of something that, until then, had been very much theoretical for me.

On Risk Management

Going into risk was one of the best things that ever happened to me. It wasn’t actually a decision that I made myself. I had a background in risk management. A lot of my statistical focus was on risk management. I have done internships in risk management. But I was actually not hired by King Street to go into risk management. I was hired as an analyst. However, the firm decided before I joined in the summer of 2007 to begin building out a robust risk practice. We know today that was a remarkably prescient decision because summer 2007 was the market top. The firm was not reveling in the fact that markets only go up; they were very focused on building a formal risk infrastructure. Because of my background and experience, despite being very new at the firm, alongside my boss, who’s a wonderful mentor, I was tasked with building out this brand new formalized program. It was one of the first times in my life I got paid to learn something, which was just phenomenal.

As a risk manager, my job was to understand how things behave. A lot of people think risk management is about preventing bad outcomes. Risk managers are most effective when things are good, as they can understand the behavior of all the assets and metrics in question. When unexpected things happen, they can speak with expertise and confidence that would otherwise be impossible to gain at the moment. That translated nicely into the startup context. Every day at a startup, you need to make decisions about things that you have very incomplete information about. And you need to make them with high confidence, even though they are extremely permanent. Should we build this product? Should we use this pricing structure? Should we work with this customer? Should we sign this contract? Should we use this software? These are all questions that startups have to make at the time of their life when they have as little information as possible about the world.

If you think about it, it’s crazy, right? You asked these companies to make these permanent decisions when they are least equipped to do so. One of the best things that any startup can do is to create a process of decision-making, which maximizes the probability that it makes good decisions when it has bad information. The risk management mindset served me well personally when I ultimately found myself here with Prefect—building a startup and making those types of decisions under uncertainty just come naturally to me. I’m trained to do that, and I’m glad I’ve been able to teach some of that to my team while learning from their expertise as we make decisions together.

On Getting Into Machine Learning

Source: https://wiki.pathmind.com/restricted-boltzmann-machine

In 2010, I became fascinated with what was then very early machine learning models. This was before neural nets were a well-known concept. This was before deep learning was a term. I saw a video of Geoff Hinton giving a lecture about Restricted Boltzmann Machines, especially dreaming about the MNIST dataset. At the time, this was the most cutting-edge, mind-blowing thing in the world. It took hours and hours to train. Today, this is like the Hello World of modern ML framework: you call a function that produces a representation for you to prove that it works. Back then, I couldn’t believe it. I watched this video over and over. I remembered calling my brother and saying: “Hey, I just saw a machine that can dream. I have to go build that now.”

I became obsessed with it and ultimately left my job to create a successful small business called Lowin Data Company. I also became involved with a piece of software called Theano, THE ML framework at the time. That was one of the first times that I began building frameworks. As I worked with many clients, I needed to deploy models for them quickly. This was before any of the tools we take for granted today as data scientists. So I needed a way to quickly translate clients’ business objectives into low-level code that my tooling allowed me to put together. So I had a product called Scarecrow, a workflow framework for building ML models. My focus was on time series since that’s always been my statistical expertise. Even today, time series is still less understood and less well-practiced than other static ML techniques. At the time, it was the completely Wild West, and it was so much fun to explore these different methods and deliver solutions for people.

On Probabilistic Thinking

One of my favorite things about being a statistician is that it taught me how to think about the output of a model. As a statistician, I care far less about the fact that I have an estimate of what a parameter is and far more about the error around it. Statisticians live and die by p-values (or whatever metrics they choose to measure confidence). That is a skill that becomes fungible all over the place. When people say we should do this or do that, some uncertainty or questions are necessarily attached to that statement. All this statistics practice helped me be very open to the idea that just because someone says something with great confidence doesn’t mean that it is probabilistically likely. Being comfortable living in this ambiguous, probabilistic world is something that my statistical training has been beneficial to some degree.

On Open-Source Development and Governance with Apache Airflow

Source: https://github.com/apache/airflow/

My problem was that I was running out of hours in the day and spending a lot of time on repetitive activities that had a lot to do with code. The same day I reached a breaking point, Airflow was open-sourced. I was excited because a project promised to automate tasks that I didn’t want to spend my time on. However, Airflow was a Python 2 project. In 2011, I went off to do the Lowin Data Company and switched cold turkey to Python 3, which caused me some problems. As my team will tell you, I am nothing but someone who loves using new shiny things. Thus, my initial involvement with Airflow as a developer was to make a dual Python 3 codebase from top to bottom. Consequently, I gained some familiarity with the product because there wasn’t a file that I didn’t touch. As I became a user of Airflow, it was only natural given that familiarity to continue in a developer capacity. Eventually, I joined the initial developer team and later became a Project Committee Member and stayed there until the end of 2020.

I learned a lot of lessons not just about open-source development but also open-source governance and governance institutions. Nothing interesting comes out of a committee. I think many amazing open-source projects have strong governance structures largely in place to preserve the functionality that’s been achieved in the product (to block the product from dramatically changing or rapidly changing). On Airflow, that became a little bit stifling for me as my needs as a data scientist became more paramount. I came from this ML world where we were starting to use huge parameter sets and scaling up tasks by the thousands with milliseconds latency. Airflow came from a starting point that was about hourly and daily batch jobs kicking them off in some third-party system. It was important to me that I found a tool that gave me the same utility as Airflow but in the context of my data science work. Those were changes that I tried to motivate Airflow itself and couldn’t convince the committee that the future of data would be analytic. All the things we take for granted today couldn’t get sufficient buy-in five years ago. If it wasn’t for me, it wasn’t a question of whether I couldn’t do this or not. This is a problem I felt intimately.

So I began building tools for myself to take some of the concepts that I loved in this automation framework that I was lucky to have participated in building Airflow and married them to the fast-paced, cutting-edge modern data science platforms that I was also privileged to be building as a consequence of my job. The result would eventually become Prefect.

Committees have a way of just killing themselves. It’s imperative when building a product to have a very clear vision of that product. Any product manager listening to us will nod at this: Often expressing a product vision means saying no to things, not saying yes to things. It’s easy to come up with features or ideas that nominally add value because doing something is arguably better than not doing something. But whether or not that value accrues to a product vision or solves problems is actually a separate step. One thing that committees are bad at doing is rejecting small ideas (because they don’t align with the product vision) and rejecting large ideas (because the idea is too large to get past the committee). You end up in a place where innovation is difficult. You grow out, not up when you have committee-driven products. This is most visible in open source because the governance mechanism in open source is frequently committee-driven. This can be avoided if you are a benevolent dictator with a clear vision or problem statement for what you’re trying to achieve, that any feature can be measured against. Great products are almost never designed by a committee. Great products are designed by a vision with some editorial license to continue making progress against.

On Negative Engineering

Negative engineering is very familiar to many people, but they may not have given it a name because it arises between things. We call it negative engineering because it is in the negative space. It’s between the data engineering team and the data science team. It’s between a database and a data warehouse. It’s between one tool and another tool. It’s the idea that there’s a gap of responsibility when we’re writing code or putting analytics into the world and trying to achieve some business purpose. We have to do this enormous amount of work to ensure that they run as expected and defend them against failure, whether it’s a bug in the code or the Internet goes out or the server crashes. Those are all things that could happen. And we need to defend our code against them. So we split the world into two: Positive engineering means I need to deliver this model, this analytic, or this system. Negative engineering means I got to make sure that this thing refreshes every day at 9 AM, and if malformed data comes in, it doesn’t crash. I got to make sure that I send logs that run on the machine with GPU. It’s this infinite checklist that we all decide to some degree how much to embrace.

Negative engineering is the mission of Prefect. It’s easily the most resonant idea we’ve ever come up with. Yet 4 or 5 years ago, when I was talking about this, I didn’t know what the problem was called. I also was confused why nobody else seemed to feel it. Eventually, I identified that one of the reasons that I was experiencing it is because, in many roles, I was both the data scientist and the data engineer. I was fulfilling roles that today are traditionally split between the two.

Broadly speaking, data science tooling is extraordinary. On the other hand, data engineers are a little less happy than they used to be. Only the people who bridged the gap between them viscerally agreed with me that something was wrong and spent a lot of time thinking about why that was. That is where the idea of negative engineering came for us. It is the realization that it is between the groups. It is the handoff of responsibility that characterizes the negative engineering problem. Once we had a name for thins, things started to take off. I would come to a room with somebody and tell them about workflow management or data flow. Maybe they recognized the business problem there. Maybe they did not. However, I started talking about negative engineering and its hidden cost (when you have a production incident and do not know where it happens), all of a sudden, people lit up and got ready for that.

I realized in some way that we were talking about a risk management problem. Today, we view Prefect as an insurance product first. When using Prefect, you are doing the minimal amount of work possible to instrument your code with a set of instructions. Prefect’s job is to respect your instructions, mainly in the event that something did not go as you expected. If code runs the way people expect it to run, you do not need a workflow system. You can use Cron to kick off your workflow. It is a bunch of functions you call and order. You can schedule jobs, show the history, and surface artifacts/parameters on Prefect’s UI. We need to be as invisible as possible when things go right and as helpful as possible when things go wrong. That is our product vision of eliminating negative engineering and being the insurance partner for our users.

On Prefect Core

Source: https://www.prefect.io/core/

I remember early in the life of Prefect; I was describing what we did to a very well-known VC. He said to me that this all sounds kind of trivial. I sort of had to laugh, and I was like: “Yeah, it is in a way. Yet companies spend thousands and millions of dollars a year solving this problem.” There is something about its triviality that is deceptive. We think of what Prefect offers today as lego bricks. Each one is innocuous. You can snap them together to build amazing data applications. Our job is to provide the bricks. We guarantee that all the bricks fit together. Our user’s job is to follow the instructions to build whatever amazing thing they want.

Our core features include retries, logging, parametrization history, tracking, scheduling, etc. These things do not sound mind-shattering because they fall into this negative engineering spectrum. But as anyone can attest: If you have got something that’s supposed to run at 9 AM and it does not respect the daylight saving boundary or does not run at all because your scheduler dies, you have a problem. If you have a node that goes down and does not produce logs that you can see because the logs are stored on the node, you have a problem. If your code does not run because of malfunction (like payload was received from some API and you do not have a way to retry that), you have a problem. With the insurance mindset, all of our features are geared not around necessarily delivering new functionality. Our users are highly experienced developers who can achieve their goals, and they have great tools to do that. Our job is to sit there and give them robust, well-tested building blocks for putting a defensive scaffold around that.

To re-emphasize the triviality of this, the number one competitor of Prefect is homegrown solutions. There is a good reason for that. Every single engineer or data scientist on the planet who has encountered one of these problems that we are talking about here does not look for a third-party tool. They build it themselves because it feels easy. These homegrown workflow systems start with an engineer encountering a problem and deciding to solve it himself. Then he solves more problems, and pretty soon, these little trivial components become a complex issue emerging across multiple environments, permissions, deployments, models, requirements, configurations, etc. It spirals out of control really fast, all at the low-level infrastructure. Once you start down that path, it is really hard to change. More than half of Prefect users come from these homegrown systems. We keep hearing from companies that are replacing homegrown systems entirely with Prefect that they are adopting Prefect incrementally: starting with the scheduler, adding retry handling, and then more. Because we focus on this incremental adoption and ease of use, it becomes easy for companies to do what would otherwise be impossible to rip out a workflow orchestrator and drop in another one.

On Prefect Cloud

Source: https://www.prefect.io/cloud/

Prefect Cloud is our commercial platform. It has a free tier, and a light version of it is also available in an open-source form. It provides a backend to the Prefect Core workflow engine. We have set all these instructions in Python code about how we want our code to be treated and what insurance paradigms to be applied to it. Then, we need a place to make sure that those instructions are respected in a robust way. If the same computer running my code is also the same computer running the workflow management system, then the same event that takes down my code has a good chance of taking down the workflow manager. Moreover, I get no insurance benefits.

One of the key reasons that the cloud exists is to provide a highly available and robust solution for enforcing the insurance rules that users express in their code. Once we have that platform, we can do all kinds of interesting things there. We can do user permission, authorization, and custom access and help businesses create the appropriate access levels they require across sensitive workflows. We can do secret management. We recently released a managed key-value store for permission to access data. Let’s say you have an ETL process that runs on a schedule. Whenever it runs, it gets the most recently added data to some database and archives it somewhere. You have this question of how you know what the most recently available data was. The naive solution is to run this every 10 minutes to get the last 10 minutes of data. But remember, we live in a place where things sometimes go wrong. What if, for some reason, you skip to run? Maybe you need to get 20 minutes of data for the next time. So there is this need to store this bit of ancillary state.

In the early days of Prefect, we did not have that in place. We used third-party services to store this information and recover it. Then this became such a popular way of working with delta transformation of data, so we began introducing facilities for people to access and store states. Facilities like that make Prefect Cloud a platform with total oversight of all workflows. Our global features exist in the platform, while our local execution features exist in the open-source engine (where they can be applied wherever the engine is deployed).

On Prefect’s Product Strategy

Great companies make things and sell things. The more difference there is between what they make and what they sell, the better the company is. This is not something where you can look at their income statement. This is much more of an abstraction. For example, Chevy makes Silverado but sells Corvette and Lexus. There is one thing where they pour the majority of their time, energy, and resources into the actual production. There is another place where they deliver something that actually accrues clear value to the company. Google makes a search engine and sells attention. Apple makes computers and sells a lifestyle. Great companies are not known for the thing they literally physically deliver. It is a conscious part of product strategy and the alignment of the delivery of value to a user versus the delivery of a commodity effort (something you produce to the user).

The majority of open-source companies seem to miss this. They seem to believe that the same lines of code they are writing represent the value delivery to the user. That has never been true of software. Just because the user can see the lines of code does not make it true. If those lines of code are in a different order in a different place with one typo, all the value that software delivers can be eliminated. In open-source, this even cheapens the idea that software is this holistic, beautifully-architected combination of more than just lines of code. We have to think through this idea: What do we make? What do we spend our time doing? How do we deliver value that’s commercially interesting?

For an open-source company, the starkness of this is that if you deliver it in an open-source context, it is very hard to get somebody to pay for it. As a consequence of this, what you will see in the open-source world is a series of antagonistic business models:

  • “I produce an open-source product, but you have to pay me to run it.” If the value add your startup is claiming to offer is that you put a piece of open-source software on a CPU, I think that is crazy because your competitors are public clouds.

  • Another antagonistic business model is there is open-source software with enterprise-created features. It is the same exact product, but because of a licensing cause, you cannot use some of the features unless you pay money.

Source: https://www.prefect.io/why-prefect/hybrid-model/

The business model that we prefer for open source is: we make one open-source product and sell a completely different product that happens to be powered by the open-source product. This allows a user to fully use the open-source product or elect to gain additional functionality/value in a different product that happens to be reliant on it. For us, the metaphor is an engine in a car. For some people, you want to buy an engine and use it for some purpose (maybe you are building a car or a car manufacturer yourself). We have an open-source engine that you can just take. But for some people, you want the service, the seats, or the red paint, so the car is actually the thing of interest. The car happens to be powered by the engine. We make an engine, and we sell a car. This becomes an aligned business model for Prefect.

On The Hybrid Execution Model

Source: https://docs.prefect.io/orchestration/

In this hybrid model, code and data remain on-prem with our open-source software, while all “hard work” stays on our infrastructure. The key to the whole model is that we have a metadata exchange between them, so we do not accept code or data across that boundary. The upshot of this is that the customer keeps all execution of code and data private on their infrastructure (just like they would with on-premise), but they have to run no additional infrastructure except which they would already have to use to run their code. We keep that all on our side as a SaaS product that exchanges only anonymized metadata. We were surprised when our first enterprise customer told us that it met all of their security requirements. On that day, we threw away all of our plans to build a managed product and an on-prem product and went all-in on this hybrid delivery model.

As a consequence of that, we had some very early traction with companies in financial services and healthcare, where privacy and regulated data are paramount. We were able to offer this differentiated product — essentially the cost of a SaaS product with the privacy benefits of an on-prem product. We built up these strong early relationships and partnerships, which led us to build the product out in a meaningful way with these wonderful customers. Then, we expanded into other industries and got the technology out there more broadly. We benefited from that fundamental understanding. Today, we have yet to have a single company telling us that they cannot use our software on a security basis, which is amazing to see this model’s versatility. We have a lot of intellectual property now at this point behind it. We have some new extensions of this in the works that will make it dramatically more powerful because once you lean into this way of working, you can do amazing things. It is a great story about being open to an idea and exploring it, discovering its value, and focusing on delivering that value as much as we can.

There are aspects of it that seem easy and very deceptively complicated, while there are aspects that seem very complicated and are actually relatively easy. One of the more challenging things for us was helping our users understand the model so that they could properly take advantage of it. We needed to educate people and design APIs properly. For example, some companies will not send their logs to the public cloud because that is a part of their private data that they do not want to leave their server boundary. But there are companies for whom code and data stay private, but logs are extremely useful for them to be able to log in to assess and take steps to guard privacy. Finding the right way to put tools into our customers’ hands to decide how to take advantage of this model was one of the most challenging product efforts. The technical side is not that crazy, to be honest with you. It is the process that enables where all the complexity is and where all the IPs are. As a matter of fact, this has never been done in our space, which is just mind-blowing. So we were able to develop a lot of interesting and innovative ways of creating this communication pattern that made it as easy as possible for our users to deploy this.

On Gradually Open-Sourcing the Prefect Platform

We did this as part of the “Project Earth” effort as a pandemic response in mid-March 2020. Our commercial platform Prefect Cloud was just a couple of weeks into its wide release. As if the pandemic was not worrisome enough, having a very young commercial product out in the world was frightening. We believed so strongly that we had built such a great platform. We have repeatedly seen from our early users that we solved their real problems. There was no way that we were going to let COVID stand between us delivering that value to our users. So we made this decision contrary to many business strategies: we open-sourced the core of Prefect Cloud, called Prefect Server and Prefect UI. As you might expect from a product that’s basically built in a week, it was a little rough around the edges. But the response to it was extraordinary.

Source: https://medium.com/the-prefect-blog/open-sourcing-the-prefect-platform-d19a6d6f6dad

We learned that we should not have kept the UI proprietary at all. The UI turned out to be one of the primary ways that people communicate the value of our engine to their colleagues. So we made this decision essentially believing that the commercial landscape was damaged at the least. We should instead participate in the open-source market as much as we can. What ended up happening is that we participated in a commercial rebound very quickly because all of these companies tried our product, used the UI to communicate the value, and started calling us up for commercial contracts and support. In a very interesting way, this step that we took of trying to deliver our product to as many people as possible (believing that we needed to do that in the absence of commercial interests) triggered a growth cycle for us. As a consequence of that, the company just took off.

We were initially very timid about open-sourcing so much of our stack as a young company. That is a one-way door decision: What if we get it wrong? What if we open-source the one thing people are willing to pay for? All these questions went through my head, which, in the end, was so silly. It does not matter. We put value in people’s hands. They discovered there was value. We offer our other product with other value for a certain subset of customers. And they showed up to buy it. They contribute to our mission because they helped more people eliminate negative engineering, which was additive to our story as a company, which in turn led to commercial growth on the business side.

On “Success-Based Pricing” Model

Source: https://www.prefect.io/pricing

Our objective is not just to have an antagonistic business model but to create an aligned model in the user’s interest. Prefect is an insurance product. We are most useful when things go wrong. But charging people for when things go wrong is terrible. It is also not how insurance is priced and sold. Thus, it is on us when things go wrong. It is a way to create the alignment in the business model in a simple way that makes a ton of sense to people. Talking to people about pricing, they would give us smiles and understand that we are doing something nice here. Those little things create good relationships. These small gestures and demonstrations make a difference.

Empirically, does pricing on successful tasks versus unsuccessful ones dramatically affect someone’s bills to us? It depends on how many things fail in the world. Ironically, you cannot control that. But at the end of the day, we are talking about a 5–10% differential in cost, not a meaningful number but enough for us to agree that this pricing model is not predatory.

On Community Engagement

I think many companies are familiar with the cliche “Do Things That Don’t Scale” as it relates to building the company and building the product. I think a lot of people do not do this when it comes to engaging their community. There is a whole variety of reasons why that might be true, but we did a few things to maximize the success of this community. In the early days when it was 5, 10, 20 people, whatever it was, Chris (our CTO) and I used to travel around the country — meeting people, giving them demos, and helping them out. We created a very aggressive response time in Slack: 10-to-15 minutes to answer any question. These are all things that, as soon as you get past 25 people asking questions, will not work. But we have done our best to maintain, if not that same 15-minute time, that same feeling of responsiveness and making sure that people feel heard.

Our belief is that if someone comes into our community, they probably have tried our software. More often than not, they run into some problems. They have overcome the hurdle of actually finding the community, joining the community, and doing something that many people find difficult, such as asking a public question about a problem. Lots of people do not want to do that. As they have come through that entire journey, the least we can do is to respond to them. Chris has brought this great structure to our community and its code of conduct, which is that we always do our best to assume positive intent on people’s behalf. And we ask that they do the same for us. Between all that, we create this very positive environment where people can feel supported and secure, even if they are asking a question that relates to frustration that they have had that we may have introduced into their life. If that is the case, we want to eliminate it as fast as possible.

As I said before, these relationships matter. There is no magic. I wish I could tell you to do these three things and have an amazing community. But it is hard work — responsiveness, attending to people’s needs, and listening.

On Building A High-Performance Team

Source: https://www.prefect.io/blog/dont-panic-the-prefect-guide-to-building-a-high-performance-team/

Our COO, Sarah Moses, put together a fantastic blog post, and I hope it gives some flavor to how we think internally and, in particular, the values that we aspire to and the standards we hold ourselves to. A big part of this is about clarity of expectation for ourselves, for our prospective hires, and for our interactions with the world. We want to be clear about what we expect, the standards we uphold, and how we conduct ourselves.

I think many times when you see companies failing in this regard, it is due to the fact that they have avoided doing the hard part, which is writing down what they expect, what performance looks like, what success looks like, and how they project that into the world. The ultimate version of this is where companies only hire from their network of known people, meaning that they have this concentrated geographic and demographic group of people coming together to build something. Can this be troublesome? In the context of the framework that we put up, it is troublesome because it means that they skip laying out the expectations. For example, at Prefect, we will not hire someone without a job description written and posted. If someone has shown up and told us they could do something and we believe that they can, and we know they check all the references, that is not enough. We need to put job descriptions into the world for our own benefit, for planning purposes, and for the world to see that Prefect is interested in hiring someone to satisfy the following objectives. It is critical.

The important note here is that we do our best to create an inclusive workforce by being objective about all of this. We put this type of standard in before we hired our 10th employee. Based on advice we got from our advisory board, we were trying to put standards and ways of working in place that could scale to 100 employees as early as possible. There is a great mix of advice out there. Many people will say: “Do not worry about this until you are big enough that it matters.” Then others will say: “If you do not do this early, you can never change it culturally.” So we made the decision to do it early. It was a hell of a project for a young company to undertake, but it paid off in spaces. Now we have a great written-down guide to what it genuinely means to build a high-performance team.

I got disappointed when companies publish their handbooks, and as I read them, it is just a bunch of platitudes: our virtues are honesty and hard work, etc. Of course, they are! What kind of company would you be if they were not? Show me what your standards are. Show me how people are incentivized. Show me how this system actually works. So we have put a lot of thought into making those standards and expectations clear, exciting, and interesting. When someone joins our company, I always ask in every interview: Tell me your personal objectives. Tell me your professional objectives. What do you want to achieve for yourself? If those align with the stated and written objectives for Prefect, that could be something remarkable. Whereas if someone’s personal or professional objectives are to do something incompatible with what the companies are seeking, we know that we are setting ourselves up for a tough situation. This blog post is part of an effort that we have to put into the world about what we are seeking to attract as many people in an aligned way as possible.

On Fundraising

One of our earliest investors said when we were forming the company: “Never take money from someone without knowing how they are going to expect that back.” That is a good lesson to keep in mind. When someone puts money into a business, they will not walk away and just hang out. They have an expectation that they are not doing it out of the goodness of their heart. They have an objective, and they want that objective to be met. When you take money from someone, you have a responsibility to fulfill that objective. Companies have obligations to their investors, their employees, and their communities. So when you choose an investor, if you are fortunate enough to do so, you want to make sure that you understand why they are giving you money and how they expect to receive it back. The reason is that you may be trading a short-term gain (cash) for a long-term problem (how you are going to get that cashback). Alignment there is critical.

I have been fortunate enough (especially coming from the finance world) to maintain very strong relationships with many VCs. Therefore, it may be surprising that Prefect avoided institutional funding for our seed round and A round. I saw no advantage to bringing institutions. We talked earlier about how companies have to make decisions when they have the least information. If this key piece of advice I got was “never take money for someone to give it back to them,” it seems insane to commit to an institutional-structured relationship as early as possible when the company is still figuring out what it is supposed to do. It restricts decisions and does not expand them. As Prefect has grown and matured and our trajectory has become more clear given the product adoption, it made more sense to align ourselves with institutions that have much more specific outcomes in mind for our company. Whereas for our earlier investors, we focus much more on strategic individuals who were less concerned with the nature of how they would get their money back and actually more concerned with whether they could help us and make an impact on our company in order to drive some larger outcomes.

Choose your investors carefully. Just about everyone is nice when you meet them, especially when there is a one-sided or mutual interest in doing business together. When things go poorly, that is when this matters. Talking from my risk management perspective, you can have the nicest person in the world on your cap table, and when you run into trouble, will they be there for you? All companies who have gone through the pandemic know the answer to that question. They know if their investors were there when they needed them. I can say that my investors were there beyond any real expectation I had for what they would do. In particular, Patrick O’Shaughnessy is one of my closest friends and our lead investor, also sitting on Prefect’s board with me. He was there to support us in a way that many people would not have been. Our company will not forget that because that is when these relationships matter most. So know your investors and how they are going to be there when you need them.

Show Notes

  • (01:29) Jeremiah reflected on his academic interest in studying Statistics and Economics at Harvard.

  • (05:33) Jeremiah recalled his four years as a market risk manager at King Street Capital Management.

  • (07:18) Jeremiah explained how the training in risk management has made a huge impact on his career as a startup founder.

  • (09:48) Jeremiah then founded his consultancy Lowin Data Company that designed and built ML systems for time series data.

  • (12:38) Jeremiah mentioned his fascination with the rapid growth of machine learning in the past decade.

  • (15:54) Jeremiah talked about his contribution to the Apache Airflow project and lessons learned about open-source development/governance.

  • (21:48) Jeremiah unpacked the notion of negative engineering and shared the story behind the inception of Prefect.

  • (27:24) Jeremiah dissected Prefect Core, the open-source framework that is stocked with all the necessary components for designing, building, testing, and running powerful data applications.

  • (32:45) Jeremiah went over the advanced enterprise features of Prefect Cloud that complement users of Prefect Core.

  • (36:04) Jeremiah discussed Prefect’s product strategy (read his blog post “Toward Dataflow Automation,” which distinguishes the difference between what a company makes and what a company sells).

  • (40:44) Jeremiah explained how Prefect users could take advantage of the hybrid execution model.

  • (47:08) Jeremiah walked through Prefect Server and Prefect UI that enable users to run parts of Prefect Cloud locally.

  • (50:27) Jeremiah talked about how his team has gradually open-sourced the Prefect platform.

  • (51:38) Jeremiah explained how Prefect settles into a “success-based pricing” model, where the cost is based entirely on the number of tasks users run successfully each month.

  • (54:15) Jeremiah shared how to nurture a highly active community of open-source contributors to Prefect Core.

  • (58:23) Jeremiah unpacked Prefect’s hiring strategy, which emphasizes the importance of hiring a team diverse in thoughts, backgrounds, makeups, and experiences (read this fantastic guide to building a high-performance team on Prefect’s website).

  • (01:07:02) Jeremiah shared fundraising advice for founders currently seeking the right investors for their startups.

  • (01:11:53) Jeremiah unpacked the two key pillars central to Prefect’s hyper-adoption within the data world: expansion and product.

  • (01:14:09) Closing segment.

Jeremiah’s Contact Info

Prefect’s Resources

Mentioned Content

Articles

Talks and Podcasts

People

Books

Notes

My conversation with Jeremiah was recorded back in July 2021. Since then, many things have happened at Prefect:

See this content in the original post

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

See this gallery in the original post