The 121st episode of Datacast is my conversation with Alexander Gallego, the founder and CEO of Redpanda Data, a high-performance, Apache Kafka-compatible data streaming platform for mission-critical workloads. He has spent his career immersed in deeply technical environments and is passionate about finding and building solutions to the challenges of modern data streaming.

Our wide-ranging conversation touches on his immigrant upbringing, his entrance into Computer Science at NYU, his engineering career at FactSet and YieldMo, his first startup building the stream-processing engine Concord, his learnings about large-scale automation at Akamai, his current journey with Redpanda building the modern data streaming platform, lessons learned from designing an Intelligent Data API and open-sourcing Redpanda, thoughts on hiring and picking customers, his desire to give back to minority in tech, and much more.

Please enjoy my conversation with Alex!

Listen to the show on (1) Spotify, (2) Google, (3) Stitcher , (4) RadioPublic, 5) iHeartRadio, and (6) TuneIn

~ New Podcast ~#Datacast E121 features @emaxerrno. We discussed:
- Immigrating to the US
- Building High-Performance Engine
- Streaming Architecture
- Intelligent Data API

Alex is building a data streaming platform for mission-critical workloads with @redpandadata. Enjoy! pic.twitter.com/HjteFn1NOt
— James Le (@le_james94) July 14, 2023

Key Takeaways

Here are the highlights from my conversation with Alex:

On His Upbringing

I grew up in a small city in Colombia called Manizales. Although technically in the northwest, it's a tiny city. When I came to the US, one of my first classes was in high school, and they didn't have an ESL program. Now, I think things are better for migrant grant students.

But when I landed, it was an old school, so all the classes were in English, even though there was a large Spanish population. The high school I went to didn't really have any handholding. So the first thing you learn when entering a country is to smile and nod.

And that's basically how you can get by for your first year. I remember the first class; I asked the person in front of me (you could imagine a teenager, I was 14 and a half, 15), "Gino, can you tell me what's going on with this history class?"

He would turn back and translate, and that went back and forth. Meanwhile, the instructor is saying, "Stop talking." Of course, I didn't know, so I would ask the person, "What is the guy saying?" And the person was like, "I don't want to talk to you." I was like, "Why not? You're translating." And he was really nice. I was just trying to understand. But of course, chatting was against what the instructor was saying, so I remember the first four hours of American High School, I was sitting in detention, and I was like, "Wow, I guess this is just how American High Schools work."

That was my introduction to American High School. It was a fun learning experience. It's very formative, I think, to have to figure out how to say, "Can I use the restroom" or "How do you order a piece of bread" and "What's my name" and that sort of thing.

Learning a new language and staying in the US has been a ton of fun. I'm a big fan. The US has been built by immigrants. It's unfortunate to see an anti-immigration mentality, but the country is built by immigrants. If you look at the people that have made an impact in the world, in the US, it's immigrants.

It was a shock because I came from a really poor background, and my mom was even poorer than I was. So when I came to the US, my parents played a trick on me and said, "Hey, if you come to the US, I'll buy you a BMX bike, an aluminum frame BMX bike." In my head, of course, I was a child, and so I had no idea how the world worked. In my head, I was like, "It's the same living condition. I'll just have a better BMX bike." My dad did end up buying me the BMX bike, but it was in Stamford, Connecticut, and there was no BMX track available. So I was like, "It's useless." That was the biggest shock.

It's just such a culture shock for so many people. It's not just the language; it's learning the culture. Learning the culture was probably the largest gap. It's like you join many cultures. I joined in a culture of my dad, who I largely didn't grow up with. I joined the culture of the US, which has totally different values than how you're brought up and what society values. To me, learning the language was easier than learning the culture. Even now, I'm not very good at going to trivia nights at bars because I don't know the answers. I have a son now who's 11 months old, and I have zero clue about all the songs to sing to babies. When I sing to him, I just sing in Spanish, and that's just how it is. The gap was definitely huge in more dimensions than we'll ever have time to chat in a single hour of a podcast. But it was definitely a big shock for me.

On Getting Into Computer Science

People often underestimate how much of a shift it is for the first person to do something. There isn't necessarily a hero story involved, but rather a long journey of discovering nuances and figuring out how to do it.

In high school, I Googled how to apply for college because my entire family was in Columbia, and I was living by myself in a new country. I was academically advanced and had skipped several grades, but filling out forms about what I wanted to do was overwhelming. I remembered being good at taking things apart, as I had grown up decomposing and rebuilding motorbikes, specifically KMs and Kawasakis. I was also good at playing video games. So, I Googled "how to hack" and found a cybersecurity program at NYU. I applied and was shocked when I got in. I was even placed out of my English classes, which was mind-blowing to me as an immigrant.

I ended up majoring in computer science at NYU because my advisor, who was in charge of the computer science curriculum, recommended it. It was a good fit, and here I am today.

On Doing Cryptography Research at NYU

I'll give you a little bit more background. When I went to college, it was the first time I was academically challenged. High school was very easy, so I skipped many years in middle school and early on.

At college, I went to a well-known university and was amazed by how smart everyone was. I did poorly on the initial set of exams, so I knew I had to get my life together. It was the first time I did well academically. As part of the Wednesday club, a hacking club, I joined a research project with Nitesh Saxena, a cryptography researcher. I've always been interested in applied systems. I left academia to go into engineering because I think I can add more value to the world by building real systems that people can use to improve their lives. For example, Redpanda is a system that's now being deployed in outer space, payment systems, oil and gas pipelines, and intelligent beds to monitor heart meters between moms and babies. That's the kind of system I wanted to build.

I was part of a research project where I had the idea of using games to motivate people. Saxena had started this area of research, and I wrote the first game to build entropy. Then, I created a new game idea that got us a bunch of papers published in prestigious journals like ACM. The games were created around cryptography issues, like how to transmit true randomness or transfer keys. We invented games around that, and then we invented a bunch of games around Mace puzzle solving for building layered security on Android. Google gave us a bunch of money for some of my ideas, and then the NSF gave us money. Ultimately, I left cryptography and academia because I wanted to build applied systems that made the world a better place today, not in 10 years.

Academic rigor is really useful, and the process of feedback in academia is brutal. I cried the first time we submitted a paper to a journal because the reviewer critiqued part of the paper. It was the first time I was stepping into a big boy podium. The two good things that I learned were: one, to not bs myself into thinking that I'm better than I am, and two, to divorce my value of self-worth from the value of my work. It's OK to write some code and realize it sucks. You need iteration and feedback to write good code.

I still carry that mentality to this day. I want experts to tell me what I'm doing wrong, not to satisfy them, but because I want to learn and do better next time, even from a technical perspective.

On His First Job at FactSet

Surprisingly, the world works. My job was to process the S&P 500 and render it into Blackberry and Nokia platforms using a fork of SQLite. However, Blackberry was the most challenging platform to work with due to its complicated API.

Despite the challenges, I realized that the world needed more builders and that anyone could improve existing processes and systems with enough time and effort. At Factset, I learned much about pull requests, writing C++ code for open VMs, and working with a different threading model and memory semantics.

I also learned that the space for improvement is vast and that meeting people who have changed the world through their work is inspiring. Factset is one such company that has made a significant impact, and I believe anyone can be a part of changing the system and building impressive systems if they are willing to learn and put in the effort.

On Being The First Engineer at YieldMo

This story begins with my friend Joe, my intern mentor at FactSet. Joe went on to sell a company to Coinbase recently. After leaving FactSet, he worked for the CEO of YieldMo, a Google venture company, Mike Yavonditte.

Mike offered me a job with double my salary and a team to work on various things. I accepted, even though I had a vacation planned. I was young and had college loans to pay off at the time, so the salary increase was very appealing. I ended up being the first engineer at YieldMo and wrote the first lines of code for the advertising SDK and a lot of the ad server code. I also hired the team.

Working at YieldMo taught me many lessons, including how to build a team, work quickly, and prioritize tasks that would move the business forward. I had a lot of freedom and ownership over the company's technical direction. However, with great freedom came great responsibility. I owned my mistakes and had to work late nights to think about scalability. YieldMo was doubling in traffic every month, and onboarding big publishers caused a 100x increase in load volume on the ad servers. As a young engineer, it was difficult to anticipate and prepare for such growth.

YieldMo was formative in teaching me how to think about large-scale systems that required low latency response. It inspired many of the ideas I later implemented in Redpanda. Working at YieldMo was a great learning experience, and the company was successful. However, I left to start Concord because of management decisions and the belief that YieldMo could have grown even larger.

On Hiring Exceptional Engineers

If I were to reduce my hiring criteria to one thing, it would be finding people who care deeply about their work from a technical perspective. They are passionate about the craftsmanship of their code, its quality, and its efficiency. They go beyond just formatting code and care about algorithmic complexity, runtime, carbon footprint, and scalability.

These individuals have a strong desire to learn and will do whatever it takes to improve their skills. Mentoring engineers who are passionate about their work has been one of the most gratifying things for me. It is crucial to give them some room to make mistakes and align them with the right problem domain.

Over time, my hiring criteria have changed, and I now over-index on the human side of the equation. I want to hire fantastic people who are excellent at a technical level and care about the problems they are working on. I design an interview process that extracts the signals I care about.

As I mature throughout my career, I expect to work with people who care deeply about their work, and I over-index on the human side. I won't hire people who are toxic team members, even if they have exceptional technical skills or are world-famous.

At Redpanda, we have amazing retention because we work with good people and are generous with equity. In summary, the lesson I learned is to work with people who care deeply about their work.

On Building Concord

Source: https://www.datanami.com/2016/07/14/concord-claims-10x-performance-edge-spark-streaming/

The backstory is that Shinji is a badass. She's a friend and a fan who just started her new company called Select Star. Rob is also an old-school buddy of mine who recently joined my team. It's really great to have such talented people on board at Concord.

When I was at YieldMo, I was using a project called Apache Storm. Apache Storm was a Closure-core system with a Java backend and Scala APIs. Debugging stack traces was practically impossible because of the compatibility issues between Scala, Java, and Closure. It was a mess and very difficult to work with.

From a technical perspective, Storm added a lot of value to the world by providing a blueprint for real-time processing. LinkedIn started using it with Kafka, and Storm and Kafka went hand-in-hand in building real-time pipelines. If you wanted to do real-time machine learning in ad tech, there were three good technologies: Kafka, Storm, and Cassandra. However, Storm had no predictability, was impossible to debug, and had a terrible message-per-second performance. It didn't leverage an Oracle, which was a big drawback.

So I built Concord on my own using C++ to create a system that was easy to understand and offered performance and predictability. The project was successful, but the world wasn't ready to adopt it in 2014-2015. To this day, Concord remains the only framework that offers programmatic reactivity to failure and allows you to reason about scalability as an independent vector from your topology layout.

We sold Concord to Akamai, which allowed me to think about the next-generation problems I wanted to solve. That's when Redpanda was born.

On Lessons Learned As A Second-Time Founder

You could have great technology and fantastic ideas, but if the market isn't ready, there's nothing you can do. Timing is crucial and cannot be fixed, and you can only guess when the world will be ready.

For me, Concord was a personal failure, despite its success in some dimensions. We all made money, and I was able to pay off my student loans, but I could see that the project had the potential to be something more. I wrote the code, and I knew that the world needed it, but it just wasn't ready. I learned that there was very little I could do to teach the market a new way of thinking.

Building a successful company requires a lot of capital, traction, and advocacy work. It's not just about building a product and waiting for people to come. It's about creating a product people love and building it for a large market. For Concord, the market wasn't big enough, but for Redpanda, the majority of adoption came from developers trying the product on their laptops. I became obsessed with the developer experience and ensuring the product was easy to use and impressive within the first 60 seconds.

Ultimately, it's the developer who votes on adopting a technology, not just solving a technical problem. Many lessons were learned, but for me, the key was building a product developers loved.

On Learnings at Akamai

Akamai opened my eyes to the level of automation needed to scale a system, not just as a function of human operators running the code but as a function of the systems built to scale. This subtle distinction means that four engineers can run thousands of computers with the majority of edge cases automated, including provisioning, tuning, debugging, and issue resolution.

I was part of many super high-growth companies, and it felt like we were drinking from a firehose, similar to Redpanda. Akamai figured out over the last 20 years what it means to run a knock, a 24/7 mission-critical system on which the world depends to render content. Seeing it firsthand and the glue code made me realize we could do better.

People tend to mystify the process, but this system is running up per script. Amazon and Apple still bootstrap with peril, and it's mind-blowing that things work. I am confident I could do better than this system, at least from a developer experience perspective at worst.

Source: https://docs.redpanda.com/docs/reference/rpk/

Learning how Akamai optimizes its infrastructure and the automation they use for day-two optimizations was valuable. Scalability with very few people is what Akamai has nailed. It's mind-blowing how few people run the largest internet companies. This became an integral part of Redpanda, with an auxiliary tool called rpk, an embedded database of day-two operations. It's all about the experience of 60 seconds to WOW, with one command for tuning the Linux kernel, running production, setting up security, adding a user, and more.

Obsessing automation and developer experience is necessary for running at a large scale. Very few companies in the world have experienced this kind of scale, and Akamai was a big part of my learning how to run at these scales.

On Introducing SMF

Source: https://www.scylladb.com/2017/11/15/smf-fastest-rpc/

Much of my research focused on understanding the gap between state-of-the-art hardware and software. Engineers tend to operate at a high level of abstraction, divorced from how the physical world actually works. While it's fine to use VM run times as long as you know how to use them, I became obsessed with the performance gap people leave on the table by using these higher-level run times.

I started looking into DPDK and extended the flood buffers compiler to investigate this, adding string type optimizations and code generators. I wrote a C++ code generator that consumed the flood buffers IDL, which is a little-endian encoding with some pointer table jumping. This compiler generated a bunch of stubs, and the code it generated was C* backed with a DPDK-enabled driver.

Through this exploration, I discovered a 34x gap in performance between state-of-the-art stream storage (back then, Kafka) and what hard work could give you. This motivated me to create the SMF storage engine, which later became the foundation of Redpanda. SMF was adopted by banks, cryptocurrency exchanges, and Italian database companies, among others.

While SMF was successful and even embedded in Apache, it remained a small project that people tended to fork. Nonetheless, I was glad that it was helpful to so many people and that I could experiment with and better understand the platform. If you're interested in low-level details, you can check out my talk about this on YouTube.

On Founding Redpanda Data

Source: https://redpanda.com/blog/redpanda-raison-detre

I started the company with a premise that, honestly, was based on my expertise. If you look at my background, I've been doing this for so long that I couldn't find a storage engine that could keep up with the volumes of data I was trying to push. So, I built my own. That's how it all started.

I remember meeting a couple of VCs who thought I was crazy to compete with the existing competition. I told them it was fine if they didn't trust me or approve of my idea. The product works because I built it. I didn't need them to tell me it was impossible because I already made it possible.

Many VCs are business-oriented and have little understanding of the technology behind the product. Even those with a technical background often don't fully comprehend the technology. I understand that they are doing their job in evaluating the company from a market perspective.

I built the storage engine because I believed it should exist. The go-to-market function was to design a Kafka replacement for mission-critical systems. Our idea was that if Redpanda crashed, you would stop making money. We needed to build a resilient, highly available, and safe transactional storage engine that people could use to build their businesses on top of.

Source: https://redpanda.com/blog/redpanda-official-jepsen-report-and-analysis

I feel proud of succeeding in that, with the Jepsen report coming out later and proving that we did what we claimed. Onboarding companies like Akamai, the largest electric car company in the world, Zenly (part of Snap), and others have validated our original thesis.

People see us as the present and the future of streaming data, and we are excited about it. The timing was perfect. The world needed something that we created. Our adoption has primarily been organic, with people coming to our website and asking if they can pay us money because they love our product.

That doesn't happen to most of my friends, but it happened to us, and I believe it's because of perfect market timing.

On Choosing Redpanda Over Kafka

Source: https://redpanda.com/blog/redpanda-vs-kafka-faster-safer

At a high level, there are three pillars to Redpanda. The first is operational simplicity. Kafka can be extremely complicated to use, but we have eliminated that complexity by making Redpanda a single file that can be copied to three computers, and we automate the rest.

The second pillar is speed. Our core engineers are experts in low-latency storage systems, and we have designed Redpanda to be fast and performant.

The third pillar is data safety. We have chosen a safe-by-default protocol to ensure that data is not lost. We onboard the complexity of data safety so that our users don't have to worry about it.

Source: https://redpanda.com/blog/redpanda-vs-kafka-performance-benchmark

Redpanda speaks the same protocol as Kafka, so you can use your existing applications with Redpanda without making any changes. This makes it easy to onboard years of code a company has written against Kafka.

In summary, Redpanda delivers operational simplicity, high performance, and data safety.

On Open-Sourcing Redpanda

Source: https://redpanda.com/blog/open-source

When I was looking to build my sales team, five large companies offered to pay me around a million dollars for some POCs. However, it did not feel like the right company I wanted to build. Looking back, I regret not having open-sourced parts of Concord when we had momentum.

It would not make sense for Akamai to open-source their app because better solutions are available today. While Concord still improves in some dimensions, other systems take the lead. I wanted to change how developers think about building real-time data, and I could not have done that if the system was closed source. While I could have made money, it could not have categorically changed how developers think about real-time data. The only way that I was going to be able to move the entire market was by making the code source available.

By and large, people can run Redpanda without paying us money. We track downloads and have a ton of Fortune 1000 companies running Redpanda, and some of them do not pay us, but that is fine, as it is part of the contract and the license. The only restriction is that I did not want Amazon and some other companies copying Amazon's model of offering this service to do all the value creation and capture all of the value. So we reserved the right to offer Redpanda as a service.

Source: https://github.com/redpanda-data/redpanda/

I recommend open-source creators look at the BSL license for open-source projects. Every project is very contextual, and the problem with open-source discussions is that everyone has an opinion, but very few people are experts. As the CEO, whoever makes the decision just needs to become an expert and talk to everyone, whether ElasticSearch, MongoDB, or Materialize, and learn about the context. What is the context that this project became a mandate? Did they have four years of ecosystem development and just need to monetize it?

Grafana is a great example. They spent four or five years building the tech and just building tech. At one point, they decided to monetize, and it was awesome because they had all this penetration. It is a balance between commercialization, adoption, and changing how developers think, which is a challenging thing. Therefore, I cannot make a blanket statement that it makes sense for everyone. It is a contextual issue, and I recommend people talk to experts, such as myself, Ajay Kulkarni at TimescaleDB, or Dev Ittycheria at MongoDB, who orchestrated the licensing for their particular projects. They know the context, and the only thing you can do is learn from their experience and understand what makes sense for your company before making the best decision that you can with experts' opinions.

On Open-Source Engagement

To some extent, it was really a matter of market timing. When we came out with the technology, there was a huge demand for a true alternative. Before Redpanda existed, there was no non-JVM alternative to doing real-time streaming. If you look at the most popular systems for managing run time, Kafka was the clear leader, followed by Pulsar. At least, that's what I see. There was no alternative if you weren't a JVM expert or had an aversion to running JVM systems.

So for some projects, you had to onboard the complexity because there was no alternative. When we presented our idea of a single binary, the world was ready for that, and much of our growth has been organic. We focused early on building really technical blog posts within the company to explain how we built the system. Engineers tended to gravitate towards that kind of honesty. We just told them what we knew, and if it applied to what they worked on, they learned and built better systems.

Blog posts were a big thing, and then in the market, some people were famous on Twitter or something like that. We weren't famous by any measurable dimension, but people knew about them. They say, "Oh, this person has worked on really impressive systems." Having that combination of producing really technical blog posts where people learn and having people who were influencers in some dimension helped bootstrap the community. And then, honestly, it's been organic ever since. Our largest source of users is still Google directly on Redpanda, and I think that's just fine.

Source: https://university.redpanda.com/

Companies tend to focus too much on trying to capture every ounce of value they create. I tend to see that as a flawed strategy. We don't exist in isolation; there's often a database, whether MongoDB, Materialize, Clickhouse, or whatever. We Redpanda as a product usually exist in between many other products. The idea is to educate people on how real-time streaming works. Some will become customers, others may not, and it's fine. The market is big enough for us to capture that and educate developers.

A lot of the content is useful now; hopefully, we could do business together in the future. But if not, that's totally fine. Maybe you got something useful out of it. We're just taking a different approach to building. We don't need to extract every ounce of value creation. We just need to extract enough to build a highly sustainable, highly scalable company. And I think there's a difference there. So I tend to see those relationships as more long-term than short-term. That's why we created a ton of learning resources.

On Redpanda's Intelligent Data API

Kafka is many things. Let me break it down for people: Kafka is an API and an implementation. The Kafka protocol became the lingua franca of streaming. Everything speaks Kafka. If you have ClickHouse, it connects to it. If you have Mongo, there's a connector. If you have CockroachDB, it produces CTC in Kafka. Just about every streaming project has a language mapping into Kafka.

In practice, we found that there was only enough to get started. People needed the unification of real-time and historical data through the same API. So we enabled writing data to Redpanda over time, making space for new data. If you just write data, you'll fill the disk at some point, and the computer will crash because the disk is full. Our background process cleans up all data and uploads it to an object store like S3 instead of deleting it. From a developer's perspective, the local storage becomes a cache. This expands the architecture to leverage true cloud-native, scalable storage backends like S3 for long-term storage, while disks on a computer are designed for short-term storage.

We've effectively unified both dimensions so you can take the same code untouched and evaluate a new machine-learning pipeline with data from five years ago. It's a very powerful concept. We've also released the idea of read replicas, which allows you to build ephemeral clusters to do machine learning, AI, and database hydration without affecting any production workload. Your client library code connects straight into Redpanda, and you can read the tiered storage, which is not as real-time as your live cluster but is pretty close, within minutes. This idea of being able to read replicas from tier storage is incredibly powerful for disaster recovery use cases. It gives developers the flexibility to evolve more iteratively and push data into Redpanda, upload it into S3, and consume all the data to build a view of the data.

Source: https://redpanda.com/blog/wasm-architecture

WebAssembly and JavaScript are important for server-side applications like Redpanda. WebAssembly is a sandboxed runtime and an intermediary language representation that allows you to teach Redpanda different tricks at runtime. It's like the Transformers combiners, where you can program your favorite language and teach Redpanda new tricks. This is the power of WebAssembly. By the time this podcast airs, we'll have the first UX in the industry that allows you to ship JavaScript to the server side to do server-side JavaScript filtering. This makes for an incredibly interactive experience that will enable developers to debug and be more productive.

On Modern Streaming Architecture

Source: https://redpanda.com/blog/what-is-streaming-data-use-cases-tools

I see different levels of sophistication when it comes to streaming architectures, and I believe the future of architecture lies in control plane databases. This means that the log, such as the Redpanda service, which doesn't have to be Redpanda specifically, becomes the source of truth for your business, and everything else becomes a cache.

An immutable log of events, such as Alex logging into the website, clicking a UI, buying a card item, and making a purchase with two items and his Visa card, can reconstruct the entire history. Many databases already adopt this architecture, including MemgraphDB, Materialize, RisingWave, Timeplus, and Deephaven. These databases use the log as a shared service in the infrastructure, and the data is materialized or cached.

As developers consume more content, this architecture will become more and more common. The pressure from consumers on systems to provide real-time responses is driving this trend. Redpanda becomes the source of truth and the base on which other sophisticated systems are built. However, we must still focus on the lowest level, such as the storage engine, to build higher-level systems. This is the long-term view of architecture and where the innovation space lies.

On Hiring and Building Culture

One of the best things we've done at this company is finding a group of people excited to work together and build something cool. It's hard work, computationally difficult even, but people tend to be attracted to hard problems. Because we're infrastructure based, there's a lot of responsibility around data integrity, high availability, consistency, and semantics. It's challenging, and the surface area seems endless. Every day we strive to be a little better.

One important takeaway is to focus on the human equation. Expect people to be tactically excellent at their job, but also make sure they're fundamentally good humans to partner with. We partnered with a psychologist to design a set of personality questions that extract signals about a person's character. We ask questions like whether they're comfortable working with a Latino CEO or whether they're racist or sexist. We try to weed out bad people from infecting our culture. Every person who joins the company is filtered against a base set of strong signals that we've identified as core to our company identity.

Culture is really what people do when they show up to work. Even under external business pressure to add new features or expand, we don't let those pressures force us to lower our hiring standards. We'd rather not build something than bring in someone who will damage what we've built. Our company is still tender and requires care and obsession to grow. We expect the people we bring in to be excellent because those already here are excellent. We want everyone to feel proud of their peers and challenged to grow.

It's important to set expectations and demand excellence from hiring managers. They need to bring in people they're proud of, who they would feel comfortable bringing to talk to a customer. If a person isn't an excellent fit, we'll walk away from hiring them. There are different ways of ensuring performance, and we'll explore those, but we won't compromise on our hiring standards.

On Finding Customers

For us, the opposite has been true. It hasn't been fine, but it has been the easy part.

We went from being a relatively young storage company to hosting the largest known Kafka workloads in the world. We have customers who are pushing 14 gigabytes per second, sustained at 10 gigabytes. To be honest, we didn't think about that condition.

We've been lucky to have enough demand to be able to choose the partners we work with. We've been picky about the companies we partner with, sometimes telling them that we're not the right fit for them at this stage. We want to make them successful and personally care about their success. If we engage with them, we want them to succeed. Otherwise, it wastes time and can develop negative relationships with the product.

We chose to do the hard things, such as solving the hardest Kafka workloads in the world. If we can solve those, it'll likely work for the belly of the market. And it worked out okay.

When customers cross the gigabyte-per-second threshold, they call us and say, "I should probably pay you guys." They're pushing absurd scales like a hundred thousand partitions. They love the product and want to pay for it.

That's how we chose our partners. And that was our process for figuring out who to partner with.

On Fundraising

Source: https://medium.com/lightspeed-venture-partners/why-lightspeed-is-leading-redpandas-100-million-series-c-553ffe38d6e

I have less advice to offer in this area because we've been fortunate enough not to fundraise very hard, as we've been preempted most of the time. People care about traction, and we've been lucky to work with companies and logos that attract the right investors. We were fortunate to have excellent board members, including Arif Janmohamed at Lightspeed, Dave Munichiello at Google Ventures, and Semil Shah from Haystack. They're all good people who challenge me in respectful ways.

If you can show traction, then funding becomes much easier. For companies struggling to gain traction, I recommend designing a process to extract signals. In my opinion, over-indexing on the human part of the equation is crucial. Expect tactical excellence from people, and find a good partner who will challenge you in the right ways.

I realize I'm lucky to have such excellent board members and investors, and I highly recommend them. They've been super positive for me and are also good humans. However, I acknowledge that not every founder has this experience, so take my advice with a grain of salt.

On Advice to Minority

I believe the advice given to individuals depends on their location in the world. Therefore, the guidance provided may vary. In my experience, having interviewed hundreds, if not thousands, of people in this company alone, it's disheartening to see entire countries relegated to trivial engineering tasks.

Why should someone be limited to only providing support or working on less critical projects? If someone is talented, why can't they be a core engineer? Sometimes, there are legitimate reasons, such as insufficient quality of education at the university level, that prevent the creation of certain skills.

It doesn't mean opportunities shouldn't be given. It just means we need to acknowledge where the person is starting from. Sometimes, from a company perspective, we may feel that someone doesn't possess the mental tooling to operate at the level we require. In those cases, we have open positions and must be truthful with ourselves.

As a minority myself, I understand the feeling of not getting the chance to work on challenging projects. Focusing on minorities in the US, which is different from minorities in other countries, I created a scholarship even before we raised a large amount of money because I wanted to make the world a better place in my own way.

I don't need to offer a hundred scholarships. I just want to improve the life of one person in the world, and I don't care about recognition. This isn't a scalable program. It isn't meant to teach a legion of people. It's just meant to improve the life of one person in the world, period.

We take all their intellectual property and return it to them permanently. We don't use it as a company. My lawyers drafted a contract that ensures that. We offer them money and mentorship and expect nothing in return. They don't have to build any projects related to Red Panda, and I don't care what they build. I just want to offer one hour of mentorship.

This program isn't designed to replace the Recurse Center or any other initiative. Instead, it's intended to complement many existing programs worldwide. I believe we can significantly impact someone's life in this way and open their eyes to the possibility that they, too, can make a difference in the world. We had the money, the talent, and the desire to make the world a better place, and as the CEO, I decided to make it happen. Our engineers loved the idea and wanted to be a part of it. We're culturally aligned, and this is the kind of company we want to build.

In summary, we can help underserved communities approach difficult problems. We'll create a safe space for them to fail and offer them some money and mentorship. It's okay if they fail on the weekends or do nothing at all. That's on me for choosing a bad candidate. The people we choose are dedicated and willing to work nights and weekends or half a week at their current jobs.

This program is twofold: to help minorities dream bigger and work on harder problems and to open up the network of some of the world's best engineers. As these individuals progress in their careers, they may have the opportunity to work for the best companies and startups in the world, thanks to the connections we provide.

Show Notes

(01:55) Alex reflected on his upbringing as an immigrant moving from Colombia to the US at 14.
(07:06) Alex recalled his undergraduate experience at NYU’s Polytechnic School of Engineering. where he study Computer Science and do research in cryptography.
(16:40) Alex went over his first job working as a software engineer at FactSet Research System.
(20:13) Alex walked through his time as the first employee and the first engineer at YieldMo.
(24:30) Alex talked about his hiring philosophy for engineers who care about their craft.
(28:03) Alex touched on the backstory behind the creation of Concord, with Shinji Kim and Robert Blafford, while working at YieldMo.
(32:26) Alex shared lessons learned from his first-time founder experience with Concord.
(35:22) Alex went over his two years at Akamai as a Platform Infrastructure Engineer after the Concord acquisition.
(40:01) Alex introduced his work on SMF, an RPC framework designed for microsecond tail latency.
(43:41) Alex shared the story behind the founding of Redpanda Data, which builds a high-performance, Apache Kafka-compatible data streaming platform for mission-critical workloads.
(47:19) Alex walked through the major benefits of choosing Redpanda over Kafka.
(51:03) Alex explained his decision to open-source Redpanda in November 2020 under the Source Available License BSL.
(56:08) Alex mentioned successful tactics his team employed in order to raise the adoption and contribution to the open-source library.
(01:01:13) Alex unpacked the design of Redpanda's Intelligent Data API.
(01:08:55) Alex provided his perspective on the modern streaming data architecture.
(01:13:24) Alex shared valuable hiring lessons to attract the right people who are excited about Redpanda’s mission.
(01:18:30) Alex talked about his experience choosing customers for Redpanda.
(01:20:33) Alex shared fundraising advice to founders who are seeking the right investors for their startups.
(01:23:23) Alex gave advice to a smart, driven minority who aspires to work on ambitious, technically deep, and challenging problems.
(01:28:18) Closing segment.

Alex's Contact Info

Redpanda's Resources

Website | Twitter | LinkedIn | Slack | GitHub | Contributing Doc
About Redpanda | Platform Capabilities | Customers
Docs | Redpanda University
Reports and Guides | Benchmarks
Hack The Planet Scholarship

Mentioned Content

Blog Posts

Redpanda raison d'etre (Feb 2019)
Thread-per-core buffer management for a modern Kafka-API storage system (Sep 2020)
Redpanda is now free and Source Available (Nov 2020)
Redpanda creates Redpanda, the Intelligent Data API Platform, backed by $15.5M initial funding from Lightspeed Venture Partners and GV (Jan 2021)
The Intelligent Data API (Jan 2021)
Redpanda Wasm engine architecture (June 2021)
We raised an additional $50M to drive the future of streaming data. Join us! (Feb 2022)
Redpanda gives Kafka a Run for Its Money (InfoWorld, May 2022)
Alex Gallego Builds Redpanda To Simplify And Unify Real-Time Streaming Data (Forbes, June 2022)

Talks

Distributed Stream Processing over thousands of Datacenters (GeeCON, Aug 2017)
How to Build the Fastest RPC (Nov 2017)
Co-designing Raft + thread-per-core execution model for the Kafka-API (Dec 2021)

People

First, my team is unreal. The builders at Redpanda, who actually make the clock tick. lucky to partner w/ them every day - thank you.

working on large problems w/ speed & focus have come in part from partnering with @arifj @davemuni @semil - ppl with deep conviction in us

1/n https://t.co/OyegPpfg4m
— 🕺💃🤟 Alexander Gallego (@emaxerrno) June 27, 2023

Notes

My conversation with Alex was recorded back in August 2022. Since then, I recommend checking out these resources:

The $100M Series C funding announcement
The revamp Redpanda Cloud
This guide for developers on streaming data
Customer case studies with Lacework, Exein, and SmartLunch
Resources on the advantage of Redpanda over Apache Kafka (cost of ownership comparison, data sovereignty, and this holistic comparison)

a ~couple of years ago we bought Kowl (now Redpanda Console) -> it was so good, we rebuilt our entire cloud experience on it to be the snappiest, privacy-preserving experience for data streams https://t.co/BhfVHD8kyS

Now backed by a larger team and is dope.

check out the gif!
— 🕺💃🤟 Alexander Gallego (@emaxerrno) June 29, 2023

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.