The 126th episode of Datacast is my conversation with Bob Van Luijt, the CEO and co-founder of Weaviate, the business created around the open-source vector database Weaviate.
Our wide-ranging conversation touches on his teenage years building software for fun, his education in jazz and music composition, his consultancy agency Kubrickology, his TEDx talk on digital technology through the lens of language, the founding story of Weaviate and the rise of vector search engines, the business model around open-source software, the AI-first database ecosystem, lessons learned from hiring and fundraising, and much more.
Please enjoy my conversation with Bob!
Listen to the show on (1) Spotify, (2) Google, (3) Deezer, (4) RadioPublic, and (5) iHeartRadio
Key Takeaways
Here are the highlights from my conversation with Bob:
On His Upbringing
My interest in technology began at an early age. I was born in '85, and when I was growing up, my family had a personal computer and early access to the internet. I remember my dad bringing home what I think was an IBM computer when I was very young, and it had QBasic on it.
Growing up in Holland, I had access to programming books for children written by a Dutch author. I remember getting a QBasic book and learning how to write basic lines of code, like "What's your name?" and "Hi!" with the variable for name. Seeing that I could make the machine do things and create stuff was magical.
As I got older, there was a boom in people moving online around 2000. Many people needed websites and e-commerce sites, even if they were not as sophisticated as today. I got my first serious gig at 15 when I offered to build a website for a business owner who sold toothbrushes and other items at gas stations.
I would work on his website after school while my friends unpacked the toothbrushes. It was the first time I had to register with the Chamber of Commerce and work on the web. I have been working on software for a long time and still enjoy making things with computers.
On Building Software For Fun
There's something interesting to say about my early experiences with software development. If we talk about using the word "business," it somehow assumes that I had some sort of business savviness at that age, but I absolutely did not. I was just doing it to make money and because I enjoyed it. Those were the two main reasons.
When I was a little bit older, around 17 years old, I started studying and got another opportunity. I worked at a company with a huge warehouse where a lot of food was transported to Holland and Germany, among other places. Back then, all the drivers had to write down their hours and driving times by hand.
The company bought a machine to automate this process, and I wrote the software for it. I enjoyed doing that very much and learned a lot from the experience. However, I didn't yet understand the value that I was creating for the company. A dozen people were reading these forms every week, but now I just sent them an Excel file and said, "Okay, you don't have to do that anymore."
At this point, I was just making software for fun and because I enjoyed creating things. It was only later that I realized the business side of software development was also very interesting.
On Going From Music to Software
I didn't pursue a music career. When I was 17, I had to decide what to study. Although computer science and related fields interested me, I wanted to create things more. I had some talent for writing software and making music and even received grants to study jazz and composition.
As a composer, I started incorporating software into my work, writing in certain languages, and performing with a band. Later, I even composed music that only utilized software. This taught me the importance of finishing what I started, which was valuable even though I wasn't yet focused on the business side of things.
After finishing my studies, I began a master's degree and continued to write software to make money. However, I realized that my passion lay in software development, so I decided to fully pursue it. My time studying music, particularly at Berklee in Boston, was formative and exciting, especially as technology and software started to emerge.
Looking back, I see this was a time of great innovation and excitement. Ultimately, I love creating things for others to enjoy or use, whether music or software.
On Creating control(human, data, sound)
Back in the early 2010s, many wearables were introduced, including rings, Fitbits, and EEG scans that entered the market with SDKs. I wondered if I could do something with such an SDK. At a Dutch Design Week conference, I presented my idea: I hired a modern dancer to wear a headband, and I registered all the information from the EEG scan. I had 16 synthesizers playing over the EEG scan information and software written around that, which I called the composition. The composition was not in musical notation anymore, but it was used purely in software. The dancer danced based on what he heard, and the software drove the composition.
You could hear similar structures every time we played the composition, but it was different because it got that information. The goal of this was to play around with variables I was getting from different sources than, in this case, from a dancer. We made a film out of that, and it was shown at some places and festivals worldwide.
The consultancy firm I started was named after Stanley Kubrick because I'm a big fan of his work. Stanley Kubrick had a type of rigor in how he worked, and his methodology was called Kubrickology. So I thought, "Hey, that's a cool name for a company." That's how I started my journey as a consultant, which began as a software engineer but later grew into more consultancy work. I started working for enterprises, and they had questions about architecture platforms. Platforms became the buzzword back then, so I was also doing a lot of work in that area.
On Being Creative with Software
I wasn't really into trend-seeking or those kinds of things, but the point I was trying to make - and this is still a hobby horse - is that technology, especially software, is amazing for being creative and making stuff. Especially back then, people didn't see it like that. It was more like if you wanted to support the banking system or something, then work in technology, and you could do something there. But all these startups started to appear, and people were making things that were even more on the creative spectrum.
I wanted to highlight the creativity of what we can do with technology. About a year ago, I even gave a TED talk about that topic, where I went one step further. The technology allows us to be creative and make new things.
I was trying to celebrate that we could do that, which I'm still doing today and still trying to support. For example, last March, I was at South by Southwest in Austin, where I liked to watch all the creative coders who write music and those kinds of things while writing code on the screen. That's an amazing scene, but even being creative in how we build business models/solutions and the things we can do with machine learning is just as important.
This should not be confused with things that people think of as being artistic. For example, now we have Stable Diffusion. People see an image and say, "Oh, that is the creative part." But that's not what I mean at all. Just human creativity to build things that can be artistic at one end of the creative spectrum, but can also be businesses, new startups, new ideas - those kinds of things.
People take the risk and make the leap of turning a creative idea into something practical. And today is just the time of software. That's just the time we live in.
On Merging Atoms and Bits
The information we process and store is initially taken from the physical world, the world of atoms, and is then translated into bits and bytes. We then manipulate this information, and something else emerges that can be used again in the physical world.
For example, I could go to a restaurant and eat sushi. This is physical stuff that goes into a machine when I pay for it and becomes a transaction that lives in the digital world. We can then use this transaction for other purposes. From a financial perspective, the restaurant owner may use the cash to buy new fish, making it physical again. However, this idea should not be confused with the metaverse concept, which is entirely digital.
Another example of this is a dating app, where, in the past, people would have met entirely in the physical world. Now, people use digital platforms like Tinder to meet and date before possibly meeting in person.
My argument is that we are moving more things through the digital world and can do more in the digital world than in the physical world. This is an exciting development, although whether it is ultimately good or not remains to be seen.
On His TED Talk
I received an invitation to give a TEDx talk, and fortunately, I had the time for it. Although they gave me a general subject, I was free to choose any topic that interested me. Language is one of my hobbies and a topic I'm passionate about, not just from a literary perspective but also how we learn and use language to communicate.
During my TEDx talk, I explored the idea that if we can use language to write software, and language has finite means but infinite possibilities, then the software we write also has infinite possibilities. As society and language change, so does the software we write. Anything that can be thought of and expressed in language can be made into software.
I also discussed the potential for digital technology to bring fantasy worlds to life. In the past, we relied on our imagination to picture dragons and other mythical creatures described in books. But with CGI and other digital technologies, we can now see those creatures on screen and make them seem real. As we continue to refine these technologies, we may one day be able to fully perceive them as reality.
This transformation takes time and happens gradually, but with each step, we see it becoming more and more real. We are slowly making our dreams a reality by using language to describe what we want to see and translating it into software.
On The Founding Story of Weaviate
Weaviate went through three important stages.
The first stage occurred when I was working as a consultant and attending a conference as part of the Google Developer Expert Program. Back then, cloud services were very new, and Google was announcing that they were going from "mobile first" to "AI first." At the same time, I was introduced to word embeddings, specifically word2vec, and the first thing I tried to do with Weaviate was marry the idea of the semantic web with machine learning.
The second big step came when I was working as a freelancer and could use Weaviate for something related to IoT. We discovered that many companies with APIs would output descriptions of what an API endpoint meant, but different vendors would use different terms to output it. We tried attaching Weaviate to existing databases and realized that vector search engines would be big. However, existing databases or search engines might not be the ideal databases to work with embeddings.
This led to the founding of Semi Technologies, an abbreviation for Semantic Machine Insights. Initially, we focused on NLP, but now our solution covers a wide spectrum, and the product is still called Weaviate. The first version of Weaviate was released in January 2020, and we have been improving it ever since. We are helping people understand the potential of these technologies in their software stack and making Weaviate better for them.
On Choosing A Co-Founder
This is a good question, but it's not easy to answer. You should ask Etienne why he joined. For founders, painting a picture of what the product can do by showing, not telling, works well.
Earlier in this conversation, I used the word "magical" to describe the machine's ability to perform semantic queries on an ANN index. Seeing the results was like magic, and dreaming about the potential of bringing that to the market was exciting. This enthusiasm is what drew in early co-founders.
There was no intentional strategy, just a pragmatic approach driven by a passion for the space and the product.
On Building An Open-Source Business
To answer the question, let's go back in history a bit. When I was around 25 or 26 years old, I started to learn and understand how we create value, and I became extremely interested in disruption theory. I wanted to know what it was, where it came from, and how it worked. I discovered how people were using digital technology to solve business problems, and I found myself super interested in business from the perspective of making stuff and making a business.
Besides software and music, a third thing entered my life: business models. I could marry business models to software, and open source was a natural fit for me. I don't think there's anything wrong with closed source versus open source, but building a business based on open source principles was the easiest for me.
Open source builds transparency; if you start with open source early, you're transparent about what you're doing. We're creating complex software, and let's be honest, it's heavy lifting from a software engineering perspective. Plus, we're entering a new niche, vector search, and everything that comes from that. Being able to openly talk about that and show what we're working on is something that open source enables us to do.
Open-source software comes with an open-source license, and if you have a problem with the technology, it's your problem, which is fair because you're getting the software for free. But if a business uses infrastructure software like a database, you probably don't want to run that in production yourself. Bigger companies want service license agreements, guarantees, and specific levels of support, while startups just want to have SaaS. Open-source software enables us to run it for them, making their problems our problems.
The beautiful thing about open source is that you do something together. You have a community around open-source software, and people might use your software for free, but they give you insights and ideas, create issues if there's a problem, help you, and talk about what you're doing. A financial transaction happens when somebody wants to run their startup on your software or use it as core infrastructure in a bigger company.
The lesson learned is that closed-source companies allowed people to pirate their software, knowing that those people might become software buyers later. Open source enables us to do the same thing from a business perspective. If you can't afford it or don't need support, by all means, use it. But if you need support or SLAs, a financial transaction happens. When it comes to open-source business models, we're just getting started.
On Vector Search Engines
Weaviate is a vector search engine, which means it's a database of vector embeddings. It's a specific niche that should not be confused with data warehouses or time-series databases.
So, what does a vector search engine do? It works with machine learning models that output vector embeddings. These embeddings are a dimensional representation of data. Data similar to each other live near each other in the same space, and if they are farther removed, they live farther away. These spaces often have hundreds or even thousands of dimensions.
Using these vectors, we can build new types of search engines, such as semantic search engines and recommendation systems. Big companies like Google and Netflix use these vectors to build their search engines. Thanks to platforms like Hugging Face and APIs from OpenAI and Cohere, it's easy to get access to these models.
But if we want to build something based on enterprise search, similar to what we're used to from solutions like Solr or Elastic, but with machine learning first, that's where the vector search engine comes into play.
The vector search engine is designed to work specifically with and scale machine learning embeddings. These types of indexes, approximate nearest neighbor indexes versus inverted indexes, are different for vectors than for traditional text search. Therefore, we believe there's room in the market for a new type of database that's good at that. The vector search engine solves this and provides the UX of existing search engines but with machine learning first.
On The High-Level Design of Weaviate
The high-level design of Weaviate focuses on the idea of people building search-based applications. To accomplish this, these applications require certain things, such as a great user experience. For example, the database needs to be easy to use.
In addition to storing vector embeddings, these applications need to be able to store data objects. They should also be able to execute traditional queries, not only query the vector index. This means that the power of search is becoming hybrid. For instance, people working with images want to perform high-scale similarity searches through the images.
We started to learn that there is room for modules around the core database. We have modules that take care of vectorization. For example, if you want to use OpenAI embeddings, you could write a script that retrieves them from OpenAI's API and stores them in Weaviate for later searches. Alternatively, you can use Weaviate's OpenAI module, which only requires filling in your OpenAI API key.
Whether you prefer to use HuggingFace or any other solution, Weaviate's design is an ML-first search engine. It offers everything people need to build great ML-first search applications. This includes the core database, data objects, vector storage, modules, and more. The open-source nature of Weaviate means that most of these insights come from the open-source application community users.
On Requirements For A Production-Ready Database
In Weaviate, you can make graph-like connections because we've adopted GraphQL as an interface. We wanted to provide a simple interface for people to query the vector space, and on the lowest level, as a user, it starts from GraphQL to query the database.
We now also have software drivers that you can use, but at the lowest level, it's GraphQL. GraphQL allows you to make graph connections, not to be confused with another graph database like Neo4j, where you can do many-to-many relations. Weaviate's graph connections are based on every data object, or node, in the graph having a vector representation you can query through.
When it comes to scalability, any database nowadays needs to be scalable, which comes with horizontal scalability, replication, and other complex design decisions. Weaviate's indexes come with these design decisions, which my co-founder Etienne and his team are working on. We've even hired researchers to help with the core technology of these indexes and how we need to scale and build them.
That scalability is the cool thing, but also the complex thing and the uniqueness of it. We've solved the challenge of scaling these ANN indexes and reached our first horizontal scalability milestone. The next big milestone is replication on top of that as well, which is new because doing it with ANN approximate nearest neighbor indexes is new.
On Use Cases For Weaviate
There are different perspectives to consider when considering why we can be proud of Weaviate. The first perspective is purely from an end-use case point of view. We began to see new startups and larger businesses building new products on top of the vector search engine's core functionality.
These products included recommendation systems and semantic search systems. People said, "Hey, we can build a whole new business on top of this technology," they chose Weaviate for that purpose. Alternatively, they said, "Hey, we believe we can create a new product for an existing company based on these technologies." Many startups aim to improve search functionality, such as searching through ticket systems, scientific articles, patient files, and more, using machine learning techniques. They train proprietary machine learning models and then use Weaviate to present the results to the world. Weaviate is used as the core infrastructure for building new products and businesses.
Weaviate's second point of pride is from an engineering perspective. The team is proud when the software scales, is fast, and when customers find new ways to optimize performance. For example, the team improved the performance of a specific type of query when a customer reported it. The team quickly found a solution, resulting in a new release and improved performance.
I remember the day we imported one million data objects a few years ago, and we were thrilled. Now, we're talking about billions of data objects, which is incredible. The combination of top-down and bottom-up approaches, with people building new solutions and businesses on top of Weaviate while the core technology is scaling and performing better, is what makes us most proud. Customers find Weaviate easy to use and enjoy using it, which is a source of pride for us.
On Engaging Open-Source Contributors
First, it's important to clarify that the definition of "contributor" depends on what you are building. For example, on platforms like GitHub or HuggingFace, contributors bring something tangible, like a machine learning model. However, when building a database, contributors take on the form of users who provide feedback on the database, ranging from UX to functionality.
For us, contributors are the most important source of feedback on what they want from the database. We have private Slack channels for heavy users who share everything with us, which is extremely helpful. We acknowledge and thank community members who make an effort to help us with every release.
We engage with our community daily as people take the time to try out our technology and provide feedback. We try to be helpful and engage with the community as much as possible. Even as our company grows, I personally engage with the community because Weaviate would not be what it is today without them.
In terms of communication, we communicate with them like we are now, and we are very grateful for their support. Public discussion is vital because it allows people to openly address issues and mistakes. I constantly look at how other companies build their communities and present themselves in the open-source ecosystem, as I find it extremely interesting and rewarding.
I enjoy meeting thought leaders in the space at conferences, and it's a combination of projects and individuals that I find inspiring.
On Pricing Model
Running a database in production is hard. It doesn't matter which database you use; it's difficult and comes without any guarantees.
You want to do two things: be fair to your users and make an optimal way of running it. To be fair to your users, you can implement a pay-as-you-grow model. In the case of Weaviate, the more you store and query, the more you pay. If you're still developing or have a down month in your business, costs will go down, and if you have a successful month, costs will go up. This way, users only pay if they benefit from Weaviate.
A pay-per-use model where you must spin up a machine and keep it running through the month with a flat pricing mechanism may be unfair. With pay-as-you-grow models, the index grows depending on how big the embeddings come from your machine-learning model. Calculating and predicting how much you will be paying for Weaviate per month is easy.
Our current user base and customers enjoy the fact that they're outsourcing these problems to us and paying us for it. And if they're just getting started, they pay less; if they grow, they pay more. It's a fair deal.
On The AI-First Database Ecosystem
One thing that all companies in these four niches have in common is the belief that AI-first or ML-first infrastructure is going to grow. More people want to build solutions that use and integrate machine learning models right from the start, not just as an add-on but at the core level of the infrastructure.
While this trend is still growing, it's growing rapidly, and we believe it will become significantly bigger. However, running machine learning models in production presents new problems that need to be solved, such as inference. This is where embedding providers come in, providing models for users to run or API endpoints for querying the models.
Sometimes, the models themselves need to be tweaked, and that's where neural search frameworks come in, providing tools for data scientists to work with the models. Feature stores serve as data warehouses for machine learning features, storing large amounts of data and features for training and fine-tuning models.
Finally, there are vector search engines, also known as vector databases. These engines use embeddings (or vectors) to search and build applications leveraging the embeddings.
Together, these four players make up the ecosystem driving the growth of AI-first or ML-first infrastructure.
On Partnerships
I'm a big fan of partnerships, especially when it comes to Weaviate. The embedding providers are the easiest example of this. Their business, core knowledge, and the communities they build are integral. Everything around the models, optimizing them, improving them, and so on, is important.
We have modules around Weaviate and try to partner with as many embedding providers as possible. We're saying, "Hey, if you have a model that outputs vectors like most of them do, you need to be able to use it in Weaviate." Weaviate is embedding agnostic, meaning we don't care where your embeddings come from. We can store it, and you can search through it, whether it's a tiny 9 to 120-dimensional model or a massive OpenAI DaVinci's 12,800.
The same goes for feature stores. I was on Simba's podcast from Featureform. They create a feature store, and it's super interesting to investigate how the feature store, the data warehouse, or features work together with a search engine. Last but not least, there are neural frameworks. We're working with many great folks; everyone is finding their place in that space. It's very exciting.
On Hiring
Hiring is hard. Ensuring that you've hired the right person during the hiring process is a challenge. Although it's difficult, we're fortunate that our hiring process generally goes well, and we can trust our instincts when deciding whether someone is a good fit for our team.
As a fully remote company, it's important to ensure that new hires feel like part of the team and are familiar with our company culture. That's why we've hired someone to oversee the hiring process and the development of our remote culture. Jessica is responsible for hiring and building a company culture that ensures remote employees feel connected to the company.
We've published our core values on our company website to ensure new hires understand the values guiding our company's development. If someone doesn't share our values, we may not work with them, even if they seem like a good fit.
We're in it for the long run and want our new hires to be part of our growth journey. We have an amazing team, and we enjoy what we do. People need to have fun and enjoy what they do because that's when we get the best results from our team.
On Cultural Fit
Open-source technology plays a crucial role in our company, allowing us to be transparent about how we build our software, technology, and business. This transparency attracts people who share our values and want to join our team.
When it comes to hiring, cultural fit is the most important factor. We can usually tell within a minute or two if someone is a good fit for our team. Of course, other factors such as communication skills and technical abilities also play a role, but cultural fit is paramount.
To ensure we hire people who align with our values, we define them clearly with our team and stick to them. We don't lie about our values or try to be something we're not. This is important because different people have different ways of working and finding success, and it's crucial to be honest about what we stand for.
In short, open-source technology and transparency are key to our success, and cultural fit is the most important factor in hiring. By defining and sticking to our values, we ensure that our team is aligned and working towards a common goal.
On Fundraising
Earlier, I mentioned the importance of sticking to your values. With so many different investors, finding ones that align with your company's mission and values is crucial.
When we first started receiving investor inquiries, I was eager to take money from anyone interested. However, with help and mentoring, I learned the importance of finding investors I liked and wanted to work with.
When searching for investors, it's essential to be transparent about your company's progress and goals. This not only applies to investors but also to employees, users, and the community.
It's crucial to find investors that share your vision and values. If you believe in early revenue, find an investor that supports that. If you believe in spending time on building a great product, find an investor who shares that belief. Whether you prefer open-source or closed-source, there's an investor out there for everyone.
In summary, trust your company's story and values and find the right investors to back it up. As Wynton Marsalis said, not everyone will like what you're doing, but as long as you believe in it and find the right backers, keep doing what you love.
On Building A Remote-First Company and An Open-Source Brand
Remote-first is a choice that requires a strong commitment. Its advantage is that it allows hiring talent from anywhere in the world or enabling employees to move where they desire. However, it does not mean that people do not come together. They still have budgets and gather for workshops, lunches, and dinners. Hiring people from anywhere means that we can build a great company with a diverse team, especially during the COVID-19 pandemic. Remote-first is a core value that people choose to join.
Our startup looks for ambitious people with expertise, such as software engineers who can showcase their skills on GitHub. Everyone in the company should have these tools, including those responsible for culture, design, customer success, and community building. To achieve this, we provide meta blogs on our website where they can showcase what they do and how they do it.
We believe in being transparent about our strategy because execution is what matters. We try to adopt the open-source nature not only in software but also in how we work and operate.
Show Notes
(01:36) Bob shared formative experiences of his upbringing with exposure to technology.
(05:08) Bob discussed his time building software for fun as a teenager.
(07:08) Bob reflected on his education in music and his decision to transition to a career in software development.
(10:52) Bob explained his project control(human, data, sound) while working on his consultancy agency Kubrickology.
(14:25) Bob discussed how using software can enhance our creativity.
(17:28) Bob talked about his fascination with merging physical and digital realms.
(19:41) Bob recalled his TEDx talk that introduced three high-level ideas about why software works well based on its ability to adapt to our language.
(24:57) Bob shared the founding story of Weaviate.
(29:40) Bob talked about his process of choosing his co-founders.
(31:29) Bob unpacked his high-level thinking around creating a business model around the open-source project.
(38:06) Bob defined a vector search engine for the uninitiated.
(40:49) Bob gave a brief overview of the high-level design of Weaviate.
(43:15) Bob talked about Weaviate's production-ready features, such as horizontal scalability and graph-like connections between objects.
(45:45) Bob reviewed the use cases for Weaviate that he is most proud of.
(49:31) Bob emphasized the importance of engaging open-source contributors to generate valuable product feedback.
(55:03) Bob talked about the pricing model for Weaviate Cloud Service.
(57:59) Bob anticipated the evolution of the tooling landscape within the AI-first database ecosystem to support the increasing adoption of unstructured data.
(01:02:31) Bob shared valuable hiring lessons to attract the right people to join Weaviate.
(01:04:36) Bob explained his process of identifying people who align with the cultural values of Weaviate.
(01:08:27) Bob gave fundraising advice to founders who are seeking the right investors for their startups.
(01:12:17) Bob highlighted his thinking around being a remote-first company and building an open-source brand.
(01:16:28) Closing segment.
Bob's Contact Info
Weaviate's Resources
Mentioned Content
People
Sam Ramji (DataStax)
Paul Graham (Y Combinator)
Book
"Hackers and Painters" (by Paul Graham)
Notes
My conversation with Bob was recorded back in late 2022. Since then, I recommend checking out these resources:
About the show
Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.
Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.
Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:
If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.