Tutorial 1. First flight with RAW

January 12, 2022
Posted by Georges Lagardère

Please note as of November 2022 we have upgraded our language – see our documentation and quick tour for details. These examples below are using previous version of RQL, and will be refactored shortly.

In this tutorial we are going to learn the basics of RAW. If you are familiar with SQL it should be very straightforward.

Types in RAW

There are two families of data types in RAW: nested types and primitive types.

The nested types are collection and record. They are called nested types because they take inner types and can nest arbitrarily.

The primitive types are strings, numbers and temporal types.

The full list of types is available in the documentation.

For instance: collection(record(a: int, b: collection(string))) is a type that represents a collection of records. Each record has two fields: field a is an int (aka an integer number) and field b is a collection of strings.

First Query

One of the key features of RAW SQL is the ability to query structuers other than relational database tables. It can, for instance read several text formats like CSV, JSON, XML, AVRO or Parquet among others.

For instance, let’s write a first query on a CSV file.

The READ function in RAW will infer the format and structure of the data and we can assign this to the airports collection for later use. airports is a collection of records, just like a relational table is. To project all columns in the file, and return the first 5 rows we can code the following function:

airports := READ( "");

airports5() := {
    SELECT * FROM airports LIMIT 5

This could also be written:

airports5() := {
    SELECT * 
      READ( "") LIMIT 5 

It is often convenient to use an object to hold a collection for instance. This can make the code more readable and easier to maintain, however the underlying query is optimised identically.

Code is encapsulated in functions, this is handy if we want to reuse them locally or make REST API calls using them. By convention, our code files’ extensions are .rql and there are Visual Studio Code extensions available to manage and run them. For this tutorial, all functions are implemented in the tutorial.rql file.

First API

To create and subsequently execute an API endpoint we use basic YAML declarations like the following:

raw: 0.9
endpoint: GET
code: rql
codeFile: tutorial.rql
declaration: airports5
  title: RAW SQL Tutorial  
  description: Tutorial
    - tutorial
format: json
computeClass: normal
enabled: true
  public: true

Executing the above endpoint:

will produce a collection of records like the following:

        AirportID: int,
        Name: string,
        City: string,
        Country: string,
        IATA_FAA: string,
        ICAO: string,
        Latitude: double,
        Longitude: double,
        Altitude: int,
        Timezone: double,
        DST: string,
        TZ: string
    "AirportID": 1,
    "Name": "Goroka",
    "City": "Goroka",
    "Country": "Papua New Guinea",
    "IATA_FAA": "GKA",
    "ICAO": "AYGA",
    "Latitude": -6.081689,
    "Longitude": 145.391881,
    "Altitude": 5282,
    "Timezone": 10,
    "DST": "U",
    "TZ": "Pacific/Port_Moresby"
    "AirportID": 2,
    "Name": "Madang",
    "City": "Madang",
    "Country": "Papua New Guinea",
    "IATA_FAA": "MAG",

We can execute more complex queries using regular SQL language features. For example:

airports := READ("");

airports_per_country() := {
    SELECT Country, COUNT(*) AS Number_Airports
    FROM airports
    GROUP BY Country
    ORDER BY Number_Airports DESC

Which will produce something like this:

        Country: string,
        Number_Airports: long
    "Country": "United States",
    "Number_Airports": 1697
    "Country": "Canada",
    "Number_Airports": 435
    "Country": "Germany",
    "Number_Airports": 321
    "Country": "Australia",
    "Number_Airports": 263
    "Country": "Russia",
    "Number_Airports": 249

Here we see countries with the number of airports per country.

With the exception of the READ keyword, this query looks like normal SQL. The READ keyword, however, means users do not have to create tables and load data into them, or discover the schema. This is done automatically by RAW.

Passing Parameters

Now let’s add a parameter to the query and compute the number of airports for a given country.

airports := READ("");

airports_for_country(country: string) := {
    SELECT Country, COUNT(*) AS Number_Airports
    FROM airports AS a
    WHERE a.Country = country
    GROUP by a.Country

You can note that the function receives a parameter which is of type string and named country, this is super handy to design REST APIs. This parameter will automatically become a parameter in the URL endpoint, for example:

The other notable element is the “airports AS a” clause which is needed here, otherwise, trying to write the following, would not work.

airports_for_country(country: string) := {
    SELECT Country, COUNT(*) AS Number_Airports
    FROM airports
    WHERE airports.Country = country
    GROUP by airports.Country

Here airports in the WHERE clause would refer to the airports variable (which is a collection) and airports.Country would not mean a thing comparable to the country string parameter.

The source code for this tutorial can be found here on GitHub, and more Tutorial links can be found below.

Georges Lagardere Sales Director RAW Labs

Georges Lagardère, VP Customer Experience, RAW Labs.


Get Started for Free

More Useful Links:

Like this? 

Sign up for our mailing list to receive the latest news and blogs

Learn More


Hello world, meet RAW

here we show how to use RAW Data Product Platform to create a simple Hello World output


Tutorial 2. Producing complex data with RAW

Second RAW Tutorial to show how to produce complex data, extensions to Group By, Top level collections and records


Tutorial 3. Managing complex data with RAW

Third Tutorial to show how sot read, manage and produce complex data in RAW