Anastasia’s Story
The motivation, as is often the case with new inventions, was frustration. Frustration with scientific applications that cannot rely on database engines and build their homemade solutions at great cost. Frustration by the emergence of new, incredibly useful paradigms for data management – e.g. machine learning – and seeing how inadequate current technologies were in coping with those. Frustrated by countless hours spent writing scripts to load data to the database, or figuring out how to tune the query engine. Frustrated by Object-Relational layers. Frustrated by having database engines continue to expect that “all data belongs here”, when data grows so much faster than one database engine can ingest it. And frustrated because the idea of data warehouses as a single source of truth had failed, but not many seemed to do much about it.
“The solution grew gradually in our heads, and with the time to experiment in academia, it became obvious that we were onto something significant. The solution was in a combination of ideas taken from multiple domains of computer science, including compilers, functional language, database research, as well as math”
How? Let’s disentangle the issues:
- It takes too long to load data. Solution: don’t load data. Instead, design the engine to query at source.
- You have to write scripts and other glue-code, i.e. to load or transform data: Solution: don’t write scripts. Build language features instead that cover these functions.
- It’s hard to tune the database engine. Plus, requirements change all the time, so even if tuned correctly, tomorrow’s queries are different than today’s. Solution: don’t tune the database. Let it tune itself based on usage.
- Modern applications have data formats that are rich and complex; not just tables and not easily modeled as tables. Solution: support rich data formats as a core part of the language and internals.
- Modern data transformations are more complex than SELECTs and JOINs. Solution: support operations other than classical database algebraic operators; but make sure to find the correct math abstractions so that the query remains “optimizable” and the query language declarative.
“Conceptually, the solution is not difficult. What is difficult is to build the correct design and theoretical framework for the solution. It’s hard to build a new system that still looks-and-feels like SQL. But that’s what we accomplished with RAW NoDB, as we now call it, with a great deal of integration between miscellaneous concepts and ideas”.