Treeverse announced today the completion of a $23M round to continue building lakeFS – an innovative open source project bringing manageability to large-scale data lakes.
Why Now?
Data lakes across all sizes and for multiple purposes are gaining popularity among enterprises for their unprecedented large and accelerating data storage needs. To illustrate this trend, according to Gartner, by 2024, large enterprises will triple their unstructured data stored, compared with 2020. In addition, to support the growing volume, the shift from on-premises deployment continues with only 20.8% of data forecasted to be stored on-premises in 2025, down from 44.2% in 2020 and 55.3% in 2018.
Organizations collect and store data from many sources to drive analytics, train deep machine learning algorithms, or create business and operational insights. While most data teams focus on, ”What applications should I build or buy to maximize the value of this data?”, they take limited focus on tools required to manage the data lake integrity – how to ensure consistency, reproducibility, understand dependencies, and guarantee data quality, etc.
What’s the analogy? Several decades ago, source code was fairly easy to manage until the number of developers and re-use of software elements grew exponentially. The answer came with Git – the open-source version control system. The exact same situation is taking place now with big data. It’s simply growing out of control and becoming more difficult to manage due to the inherent gravity of large sets of data. Therefore, a Git-like version control functionality for data lakes is required to allow data teams with their code and data management workflows.
Why Did We Invest?
Like with all my seed stage investments, I am often asked, “Why did you choose to invest in this company? What sets this investment opportunity apart from the hundreds you consider each year?”
My answer can be split into two core elements, with the first being what I define as founder-market fit.
It has been a continuing theme that entrepreneurs should have a deep and nuanced understanding of their customers’ needs when they develop a product. It is my experience that good product-market fit is the #1 enabler to a successful early stage investment. Who is better than entrepreneurs, who were potential customers themselves, to accurately define and build the right product?
Treeverse is a perfect example of a highly experienced team that had to develop a home-grown solution to manage their own data lake and realized that the whole community could benefit from such a platform. Einat Orr and Oz Katz, who founded Treeverse, would have been the customers of lakeFS, had it existed before.
The second factor is the market size of such a product. While Treeverse is a young company, it is obvious that:
- Everyone who manages a data lake can take advantage, and realize value from, such a tool.
- No other company focuses on this conceptual (and ambitious) approach to data versioning. The market is open for Treeverse and lakeFS to take a leadership position and grow it with a fast-moving roadmap.
Why Open Source?
Treeverse believes that an infrastructure offering like lakeFS should be embraced and supported by the community in order to build a long-term trust with the tool. Such long-term trust is critical for lakeFS to become a widely adopted market standard.
Backing a company that chooses to share its technology with the community is not trivial in the early stages. Once a company decides to take the open-source path, its stakeholders understand that the community comes first, and successful monetization will come at a later stage.
This round of financing will also allow Treeverse to add many more capabilities to the lakeFS open source while building a hosted offering for customers that prefer a managed service. This approach is similar to many other successful companies that chose the open-core go-to-market approach.
What’s Next?
This is just the beginning, and the roadmap is ambitious. One of the areas that Treeverse is looking to tackle is multi-data lake management.
There are many types of data lakes. Some applications use multiple data lakes simultaneously – from native S3, EBS, to different types of databases – structured and unstructured. Managing all these options with an easy-to-use DataOps tool is a significant roadmap mission for Treeverse.
I am sure that the company will keep on innovating in this direction along with the lakeFS community.