No credit card required
At RAW Labs we are our mission to allow our users to “query all the data, anywhere”. In doing this, we had to spend a lot of time researching query languages.
Querying all the data
SQL is the obvious candidate to build upon, but unfortunately, as discussed here, SQL is really built to query tabular data. Despite the best efforts from standards organization, retrofitting new data types and formats into SQL is far from ideal.
So the first step to query all the data is to define a richer data model, where in addition to tables of numbers, dates or text, we also support nested structures for instance. These are the constructs that enable us to truly support many of the datasets we see in the real world, where JSON, XML and other formats are prevalent.
Querying data anywhere
The next step in our mission – query data anywhere – requires us to rethink both the query language but also how and where it executes. Our goal is to query any data, anywhere, and this means the ability to query datasets stored remotely as well as discovering what those datasets are and what data they contain.
This is really hard to do in current systems. For instance, in SQL-based systems, users never query data anywhere; instead, tables and schemas are defined (and data is typically loaded) upfront. Even those database engines that allow users to read remote data still require schemas to be defined first.
So the next step in our mission is to provide facilities in the language to discover and query “never seen before” data, no matter where it is located.
And so Snapi is born!
And this is why we ended up creating a new query language at RAW Labs. It is called “Snapi” (it’s snappy!) and you can learn more about it here. Snapi flagship features are:
- the ability to query data directly from databases, files or web services, at source;
- it is a declarative language, allowing for powerful optimizations;
- it includes data discovery constructs to query never seen before data whose schema isn’t known;
- but it is also type safety;
- it is a modern, secure language, with an advanced approach for handling errors that typically happen with dirty data;
- it supports complex data types;
- it is scalable.
Snapi is the result of a long-running academic research project at EPFL DIAS Lab, which many publications to its name. It has now been turned into a core component of the RAW platform, which you can learn more about here.
As always, stay around to learn more or join us to leave your thoughts.