Datacast Episode 124: The Open-Source Cloud Playbook, The Modular Future of Data and AI Infrastructure, and Meta-Learning as a VC with Casber Wang

The 124th episode of Datacast is my conversation with Casber Wang— a Partner at Sapphire Ventures. He focuses primarily on security, enterprise infrastructure, and data analytics.

Our wide-ranging conversation touches on his undergraduate experience at UC Berkeley; his transition from investment banking to venture capital; his current journey at Sapphire Ventures investing in security, enterprise infrastructure, and data analytics; the open-source cloud playbook; trends in the open data ecosystem; the modular future of AI infrastructure, the evolution of software development lifecycle; his learning process when encountering a new industry; and much more.

Please enjoy my conversation with Casber!

Listen to the show on (1) Spotify, (2) Google, (3) Deezer, (4) RadioPublic, and (5) iHeartRadio

Key Takeaways

Here are the highlights from my conversation with Casber:

On His Upbringing

I grew up in Wuhan, a city nobody knew about before 2020. Now, it has become famous/infamous. Regarding my formative experience, I cannot think of a single strong experience. However, deciding to come to the States for my undergraduate study was a key point in my life. In high school, I decided to pursue a different route and did not realize the lifelong implications that would have on my life.

That decision was a leap of faith, but it allowed me to pursue my undergraduate studies at UC Berkeley. Before I came to Berkeley, technology was not my passion. I wanted to study history, as I was a big history buff. However, my time at Berkeley coincided with the rise of consumer internet companies like DoorDash, Uber, and Lyft. They came out of Y Combinator or other startup programs. That was a fascinating time because it was the first time people had access to a pretty consumerized modern cloud-native experience on their browsers or mobile phones.

I was more interested in the infrastructure side of things, such as how things worked generally on the backend versus the consumer experience side. While I experienced good consumer experiences, I could not find patterns behind them. But what's really interesting for me is seeing how things work on the backend. From an academic experience perspective, there are not many things that jump out right away. I would say that working on my startup, internships, and other extracurricular activities probably left a bigger mark on my undergraduate experience.

On His Time at UC Berkeley

Firstly, the Etch.ai side was a great experience I wouldn't have had anywhere else had I not put in so much effort to do it right. I was probably missing classes and doing many things outside of the classroom to focus on the startup. The idea was a personal CRM where, for example, if I came here to do a podcast, I could immediately get the three or four suggested topics they wanted to talk about. It's almost like a marriage between Superhuman and the Superhuman co-founders' prior startup, Rapportive, which is a Gmail add-on that pulls LinkedIn or website information about a particular person before you chat with them.

My takeaway from that experience was that startup is really hard. You could get funding and all those awards, but at the end of the day, if you build a product that's put in an app store and you can't really achieve exit velocity, it doesn't really work, right? It's extremely challenging, and my co-founders actually ended up dropping out at Berkeley. I continued to work on this for another year before letting it go. But I think my biggest takeaway is that founders, in general, are working on something incredibly difficult, and naturally, the odds are stacked against them. So, understanding how hard it is helps me today as a VC to empathize more with the founder. That's probably one takeaway.

Secondly, I didn't realize back then, but this is a B2C story, which, later in my career, I would say I spend more time on the B2B side of things (versus direct-to-consumer). Those are my two biggest takeaways from that experience, beyond the fact that I got to know many people in the Valley.

People complain about all the problems in San Francisco and whatnot. Still, one of the biggest things in Silicon Valley that I learned over time or during this experience was that you can always go and ask somebody for something, and with no reason, that person would give you at least five to ten minutes of their time. Again, I'm not saying that's a silver bullet you should always find, but in general, in the Valley, that's really beautiful, the whole pay-it-forward culture of just being willing to spend time and effort with somebody who has no relation with you before. I met some really great people throughout the process, their investors, entrepreneurs, and whatnot. That's also another major point I got from that experience.

Moving on to the Wish internship side of the story, being part of a very explosive startup was quite interesting. For folks who aren't familiar with the Wish story, they are basically a mobile e-commerce platform that imports goods mainly from East Asia, China, Korea, and Japan and sells them to the audience in the US. Because there's a mismatch between the end consumer and certain category goods that are very hard to come by on Amazon, per se, you can only get them from a platform like Wish, which is really mobile e-commerce-oriented.

We probably grew from around a 50-person startup to almost 300-350 during my time there. Seeing the whole change of a rocket ship, being part of it, witnessing that whole growth, and seeing the impact, strain, opportunities, and challenges that growth creates for the entire organization left a big mark on me. But also, it's just understanding the exponential nature of technology. I remember being assigned different tasks, one day on this floor and the other day on a different floor when we moved office, just in the span of three months. I think those really left a pretty big mark on my career, my thinking, and my sort of ambitions around technology.

On B2B and B2C Investing

Broadly speaking, there are no good or bad, big or small approaches in business. However, there are general patterns to consider. For example, you typically won't see a B2B company grow from zero to 10 million users overnight, whereas with B2C, it can happen. Over time, the market has become more saturated, so there are more control points to consider from a business model perspective.

There are many patterns to consider; some great people have written books on this. Some excellent VCs in the Valley have done this, such as Bill Gurley, who has his own theory around marketplaces. However, finding solutions to problems in the B2C space can generally be a little harder. It's more elusive, and the discovery process can be time-consuming, much like the artist's creative process.

In contrast, B2B has a more structured feel akin to cooking chicken. There's a playbook to follow, and once you cut it open, it comes out the same, regardless of how it's cooked. This approach fits my personality a little bit more.

On Transitioning From Investment Banking to Venture Capital

My banking experience was more accidental. I probably wouldn't have pursued that path if the startup had worked out nicely. Like many recent college graduates, I had no idea what I wanted to do. Banking seemed like a cool job from the outside, so I picked it.

From a learning perspective, I definitely learned a lot. I learned skills such as how to read financial statements and gained airtime with management. On the investment banking side, you work with more mature companies, such as those ready to go public or going through an M&A process. This exposes you to the cream of the crop among all startups who can cross the finish line.

Seeing patterns at a high level, including market trends, products, and the finance side of the equation, was my biggest learning from that experience.

Regarding my transition from investment banking to venture, I was lucky enough to be part of the IPO of MuleSoft, which turned out to be a big position for Sapphire Ventures. That's how I got to know the Sapphire folks and one thing led to another.

From a motivation perspective, there are two reasons why I wanted to be in venture. Firstly, I have always liked the technology side of things more than the finance side, although both fascinate me. Secondly, I learned that I have an intuition for writing. I wanted to be in a job where I'm more accountable for the output or outcome of something versus the input.

When you're a junior in investment banking, your job is to assemble certain memos, research, or pitch books, attend drafting sessions, and write as ones and whatnot. You don't really have a hand in how these projects will turn out, and you're judged by the quality of your work, which can be subjective. In venture, you are more accountable for the output because, over time, you will be in a position where you advocate for your own deals, and the success and failures of these deals are your scorecard.

Looking back, I think this prompted me to make a career change.

On Getting Into Tech and Venture

In the tech industry, it seems everyone is doing something related to tech, especially in places like San Francisco. There are many avenues to explore, but finding the right environment from a risk-reward and cultural standpoint is important. The golden rule is to ensure you like the people you work with. Quality people are key, in my opinion. When it comes to entering the tech industry, many companies and resources are available to help.

On the venture side, it can be a more obscure industry to get into, but having hard entry-level skills in finance or technology, intellectual curiosity, and a willingness to hustle can help you get your foot in the door. As a junior VC, you must be proactive and hunt for deals. There's no book on how to get into venture capital, so it's a process of self-discovery and learning on the job. The recruiting process is an audition for the real job, which involves a lot of hard work and effort to find the right opportunities.

On Sapphire Ventures

From a high-level perspective, Sapphire Ventures manages about $10 billion today, investing in the latest $2 billion fund. We are generally considered a growth-stage VC, meaning that the companies we invest in have some product-market fit and a small sales team in place. It's not just the founders making calls to their friends about buying certain software; there's a repeatable sales book. That's where we come in and add more fuel to the fire.

Generally, we invest in companies between a Series B all the way up to a pre-IPO company. We take a concentrated approach, with around 82 or 83 active portfolio companies at any given time, against a $10 billion fund. Instead of just spraying and praying, we run a focused picking approach.

Regarding investment focus, the group is B2B-focused. The three main priorities are B2B infrastructure - where I generally spend all my time, B2B applications, and a little fintech, healthcare IT, and other areas. But we generally try to focus more on the B2B side of things than the B2C side. Companies like Square and LinkedIn are overall good examples.

On Proving Value As A New Investor

There are a couple of things to consider. Firstly, being able to do the job and the work assigned to you is important. This is still an apprentice job, so you won't know and understand everything on your first day. Someone has to guide you in the process, but you need to do the work assigned to you, such as creating a memo, conducting due diligence, finding out information about a company, etc. The first step is earning your stripes by creating original work and doing great work.

Once you have a good sense of what the job entails, you can break it down into four steps. The first step is sourcing, followed by due diligence to determine if the investment is worth the time and if there is a real market. The third step is winning the deal, which involves convincing the company to choose your VC firm. Finally, portfolio support requires board-level work, such as recruiting executives for portfolio companies.

Early on, you will spend more time on the diligence and sourcing side, and as you build your portfolio and profile, you will become the person to win deals and handle the more complex issues. As an investor, you will need to be a thought partner to more senior investors, and later on, you can show value in multiple ways based on the four stages mentioned.

On Security Investments

The two companies mentioned are for our benefit but are at different ends of the barbell. JumpCloud is a more mature company in the true growth stage, probably a few years away from an IPO. On the other hand, Uptycs is a smaller, earlier growth company with good product-market fit, but they were still figuring out a lot of things.

Generally, the stage where we got involved with Uptycs is where we have the second or third touchpoint. We probably met them at series A, but that's where we generally make an early bet. On the other hand, with JumpCloud, I already knew the CEO for three and a half years before the investment, and for various reasons, we didn't get involved early on. When they hit certain milestones, we internally decided to get involved.

From an investment perspective, JumpCloud is a classic product-led growth (PLG) company. It’s a directory plus identity solution for the lower end of the market, such as SMBs or mid-market cloud-native companies that require an identity solution like Okta, device management like Jamf, and a password directory like Microsoft Active Directory. It doesn’t make sense for SMBs or middle market companies to go to different vendors to procure different needs. Why not just go to JumpCloud and buy the entire solution? This is generally a bundling of tooling on the SMB and middle market sides.

In the case of Uptycs, it is based on an open-source security solution that came out of Facebook called osquery. Security is an area where people don’t rip anything out because they’re always thinking about buying more solutions to cover more things. This leads to a huge fragmentation in the security industry, with a thousand vendors at different trade shows going after the same problem. Thus, we try to find platform solutions where people can standardize their stack on at least one part of their operations.

I got involved with Uptycs because I saw great early momentum from a local standpoint, with lots of great cloud-native logos adopting the solution and investing a lot of resources in it. This is a good sign that people are betting on the platform potential of the company.

Security is probably the most resilient pocket, and we will likely see growth in the broader security span. However, we know the span is not going to grow equally. The biggest underlying trend behind it is the rise of the cloud, and we're so early into this cloud security journey. We have yet to figure out how to protect critical data and applications built on the cloud. The biggest problem solved today in the market is visibility, i.e., where your asset is and where your data is stored.

On Enterprise Infrastructure Investments

I would say that both companies are on the earlier end of things.

On the Tetrate side, it’s really about the open-source playbook. Founders JJ and Varun have a lot of experience creating and maintaining the most popular service mesh on the market, called Istio, which came out of Google. They can take this low-level project and evangelize it to more companies.

But they also start building applications on a foundational low-level technology. For example, various security and network telemetry monitoring applications are on top of it. So that’s a big bet there.

On the Zesty side, it's more about cloud infrastructure management. Some people outside call us a cloud cost-saving company. Before us, we think of it as an infrastructure management company with cost savings as a killer use case. I believe the company will keep pushing a cost savings message, but underneath, how they save you costs is a pretty complicated, complex infrastructure management process.

It doesn't hurt that we were a big investor in CloudHealth, founded by an executive from VMware, the first-generation cloud cost management tool. That's more of a dashboarding tool for the CFO versus, in this case. They're more embedded in the engineering process, which has pros and cons to both approaches. But in my opinion, the more embedded you can be in the whole software development life cycle, the more sticky you are as a solution in general. That's what I really like about them.

They've got pretty good commercial value to the economic buyer, too. CFOs have been signing on the dot, and when they saw the cost savings that came out of it, it made for a natural and easy buy for the end economic buyer here.

On Data Analytics Investments

Dremio is a cloud data lake that allows you to point to your online data sources, run queries on them, and procure data. This is similar to what Databricks calls a "lakehouse" company.

By combining Dremio with the underlying data source, you can effectively turn it into a data warehouse experience and put BI tools on top of it. You can run queries between two, three, or four different sources. While Snowflake is a great company with a vision of having one data warehouse to serve all purposes, from an enterprise perspective, there won't be one repository for all your data.

There are many desperate data sources for good reasons. You may have highly valuable and regulated data, which you'll store in an on-premises air gap server. The same data manifests itself differently in the cloud. Data like sales and marketing data, along with some financial data that needs high performance, might go into a cloud data warehouse. The vast majority of data, like clicks from a website, customer events, or streaming data, would be best suited for a cloud data lake like S3 or any object storage.

Therefore, enterprises will need tools to access those cloud data sources and build applications on top of them. Additionally, sometimes you need to run federated use cases. What if you need data from S3 and RDS, or if you need to join two data streams to create some data applications? These are the use cases that Dremio powers.

Overall, it is a bet that disparate data sources will always exist in the enterprise, and for good reasons.

On Giving Advice to Portfolio Companies

I'll approach this from my own perspective. We can talk about CEOs, especially in a growth stage, as the chief recruiting officer. This is because their goal is to attract and provide leverage to the best and brightest talent they can hire.

So, how do CEOs maintain the company culture? How do they get the best leaders in place? At least 50% of the job of being a CEO or even a founder is accomplishing this. But sometimes, it's difficult to accept this concept if you're a technical founder or a founder from a go-to-market background.

One day, you could be really focused on a project, but now you've hit the growth stage, and you have to spend 50% of your time interviewing people and courting independent board directors to get the right talent. It's not an easy transition, as people think. That's probably the biggest point.

Secondly, once you have those people in place, how do you balance being demanding and supportive of the team you built?

I mean that you want to put them to a high standard, but you also want to be supportive and give them enough room to run. Some of the best people you can hire are not just order takers. They won't just go and fulfill your vision exactly because they have their own blueprint in mind, and they know what's best. Balancing being a demanding and supportive leader is the hardest balance to hit.

Lastly, as we see today, founders are doing incredibly hard things, and empathy is hard to come by. Early on in my career, sometimes I saw a number that didn't show up how I wanted it to and had an immediate knee-jerk reaction to ask the CEO what happened. But the important thing is that sometimes investors and we ourselves care about things that have nothing to do with what the founders care about. How they run a business and look at metrics could be very different from how investors do. Therefore, I remind myself daily how to be supportive and perform my fiduciary duty as a good board member without overstepping and lacking empathy.

These are the three main points I would mention.

On The Open-Source Cloud Playbook

The open-source landscape has evolved, as the blog post predicted. Many new open-source companies aim to go straight to the SaaS or cloud stage to solve customer problems and offer open-source cloud products. This contrasts with the previous generation of open-source companies that used an open-core model, where they helped customers maintain their self-hosted solutions. The trend is for more open source companies to go directly to the cloud and SaaS on day one, as seen in public companies like MongoDB, which almost exclusively promote their Atlas, hosted cloud product.

Source: https://sapphireventures.com/blog/3-strategies-software-companies-can-borrow-from-the-open-source-cloud-playbook/

Another key point from the article is how to make the end buyer and user aware of the product. This is important because it can make the buying process smoother. For instance, if you’re selling a database or a security solution, the end user may not use it the same way they would use a collaboration tool like Calendly. Therefore, it’s crucial to become product-aware for the end user. This way, the buying process can be smoother, and the economic buyer can see that the developers are already aware of the solution and want to ask about it.

Both open-source and non-open-source companies can adopt this approach by changing their marketing, sales, go-to-market, or product strategies. However, executing this is not as simple as creating a website and offering a free trial. The devil is in the details, such as handling objections, which clients to take on, and which ones to say no to. The key takeaway is that becoming product-led is essential for companies to succeed.

On Trends In The Open Data Ecosystem

I believe your podcast is named “Datacast,” right? It’s a great name because the modern data stack has so many exciting developments.

A few top-of-mind developments that come to mind are:

  1. The rise of DataOps as the middle layer of the data stack. This is because the bottom layer, such as data warehouses or BI tools like Looker, Tableau, and Thoughtspot, is already established in a typical Silicon Valley company. Once the infrastructure and visualization layout are in place, the question becomes how to ensure the right data makes it to the end user. This is where DataOps comes in with data management, data governance, and pipelining tools like Matillion or Fivetran. The rise of DataOps and the opportunities that come with it make a modern data stack more efficient and very interesting from a broader trend perspective.

  2. The unbundling of BI, or business intelligence, is another exciting development. In the past, the problem with BI was that people couldn’t get their hands on data in time. Now, we have self-serve data, but the issue is that sometimes, it’s a complex process, and people don’t know what questions to ask. The unbundling of BI is about serving data to the right end-user system. For example, salespeople want to see the right kind of data in Salesforce, not a BI PDF that Tableau produced. Marketing people want the latest and greatest of their A/B testing results within their work system choice. This requires tools to activate and operationalize certain data sitting inside a warehouse and pump it into the right choice of any system, not necessarily a BI tool.

  3. Diversification of form factors from a database standpoint is also interesting. Beyond the technical scaling issues, the biggest thing is the cost-benefit from a query standpoint. Why do you need all your data in a very expensive, highly active warehouse? You don’t need it there all the time. That’s where solutions like Dremio come in, which allow you to go to the data lake. However, you’ll still have some data you want to have in real-time, and solutions like Apache Pino and StarTree help you achieve a different point on a cost-performance curve.

These different data infrastructures help you achieve different points on a cost-performance curve, and they are emerging rapidly.

On Interoperability

The key to interoperability is ensuring that your tool can work seamlessly with all other tools you have and that it is future-proof. One of the biggest advantages of open source is that even if a commercial vendor goes away, you still have a valuable technology that can integrate with other technologies.

This is often the result of having a more open system, as people are more likely to adopt it and put it on their resumes. As a result, other companies start building their tools with your tool in mind, making you more future-proof over time.

While a vendor doesn't necessarily need to be open source, the principle of being open is important for building a strong ecosystem where data is not controlled by a single vendor. In the past, BI tools used to come with a data warehouse and visualization layers, but now we are seeing a modularization of tools that specialize in specific areas. As data requirements become more complex, there is a need for more specialized tools that can address these needs beyond just surface-level requirements.

On The Modular Future of AI Infrastructure

It’s hard to determine which categories are overhyped and which ones are not. When it comes to AI and ML, however, it’s undeniable that they’re here to stay and will only become more prevalent over time. That being said, a company still needs to make money, and there has to be actual demand for their product.

While many categories are real, consolidation is bound to happen. Having seven or eight vendors going after the same problem doesn’t make sense. People tend to overestimate the near term and underestimate the long term, but it’s interesting to note that the AI and ML space will likely play out much faster than people realize once the hype cycle comes to an end.

From an investment standpoint, it's essential to focus on practitioners, such as data scientists, engineers, and analysts specializing in ML. These individuals will dictate what the AI and ML toolchain will look like in the future. The tools people want and need will be here, while those that aren't in demand won't.

Certain vendors will likely take leadership in the ecosystem and become thought leaders in the field. End users will ultimately choose which tools they want to use, which will determine what is needed and what isn't. From a venture perspective, observing what AI and ML practitioners are doing is important.

On The Dynamic Evolution of the Software Development Lifecycle

There are several interesting trends in the venture space. One is the optimization of the software development lifecycle (SDLC), which has become quite mature over the last 10 years with well-established Git, CI, and CD systems. The challenge is to infuse modern machine learning (ML) techniques to further streamline the SDLC and make developers’ lives more efficient.

Supply chain security is another important trend. Finding a balance between security and efficiency is crucial in deploying weekly or daily. Organizations need tools to help them achieve their goals without sacrificing security or efficiency.

Lastly, there is a shift towards cloud-based development environments offering a more standardized and efficient working method. This gradual but inevitable shift will become more prevalent over time.

On Learning A New Industry

Learning is a process that occurs over time and through various experiences. As someone lucky enough to have a job that involves talking to people, I have had the opportunity to engage with individuals who are often more knowledgeable than I am about specific topics.

For example, I may speak with an expert who has spent 20 years working in security or a luminary in the DevOps field. These conversations allow me to build upon the mental models I have developed through my research and reading. However, it is really the combination of hard work and ongoing learning that has helped me to grow and develop.

At certain points, I may speak with the right person or encounter a new company that provides me with new insights and helps me connect the dots. It's like learning to ride a bike or swim - sometimes, it just clicks, and everything makes sense.

To achieve this level of understanding, it’s important to continuously do your homework through various forms of reading and meta-learning. This may involve perusing sites like Hacker News or Twitter to gain pockets of knowledge that may seem unrelated or irrelevant at first. However, at some point, it’s crucial to reach out to others and engage in the process of educating yourself.

As you accumulate more knowledge and points of reference, you begin to build a critical mass that enables everything to click into place. And once it does, you’ll likely want to continue learning and exploring even further.

In short, learning is a lifelong pursuit that requires hard work and an ongoing commitment to expanding your knowledge base.

Show Notes

Casber's Contact Info

Mentioned Content

Articles

Books

  1. "The Power Law" (by Sebastian Mallaby)

  2. "Engines That Move Markets" (by Alasdair Naim)

Notes

My conversation with Casber was recorded back in late 2022. Since then, I recommend checking out these resources:

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.