BLOG

Tutorial 1. First flight with RAW

January 12, 2022
   
Tutorial
Posted by Georges Lagardère

In this tutorial we are going to learn the basics of RAW. If you are familiar with SQL it should be very straightforward.

Types in RAW

There are two families of data types in RAW: nested types and primitive types.

The nested types are collection and record. They are called nested types because they take inner types and can nest arbitrarily.

The primitive types are strings, numbers and temporal types.

The full list of types is available in the documentation.

For instance: collection(record(a: int, b: collection(string))) is a type that represents a collection of records. Each record has two fields: field a is an int (aka an integer number) and field b is a collection of strings.

First Query

One of the key features of RAW SQL is the ability to query structuers other than relational database tables. It can, for instance read several text formats like CSV, JSON, XML, AVRO or Parquet among others.

For instance, let’s write a first query on a CSV file.

The READ function in RAW will infer the format and structure of the data and we can assign this to the airports collection for later use. airports is a collection of records, just like a relational table is. To project all columns in the file, and return the first 5 rows we can code the following function:

airports := READ( "https://raw-tutorial.s3.amazonaws.com/airports.csv");

airports5() := {
    SELECT * FROM airports LIMIT 5
}

This could also be written:

airports5() := {
    SELECT * 
    FROM 
      READ( "https://raw-tutorial.s3.amazonaws.com/airports.csv") LIMIT 5 
}

It is often convenient to use an object to hold a collection for instance. This can make the code more readable and easier to maintain, however the underlying query is optimised identically.

Code is encapsulated in functions, this is handy if we want to reuse them locally or make REST API calls using them. By convention, our code files’ extensions are .rql and there are Visual Studio Code extensions available to manage and run them. For this tutorial, all functions are implemented in the tutorial.rql file.

First API

To create and subsequently execute an API endpoint we use basic YAML declarations like the following:

raw: 0.9
endpoint: GET
code: rql
codeFile: tutorial.rql
declaration: airports5
metadata:
  title: RAW SQL Tutorial  
  description: Tutorial
  tags:
    - tutorial
format: json
computeClass: normal
enabled: true
security:
  public: true

Executing the above endpoint:

https://api.raw-labs.com/tutorial/five-airports

will produce a collection of records like the following:

collection(
    record(
        AirportID: int,
        Name: string,
        City: string,
        Country: string,
        IATA_FAA: string,
        ICAO: string,
        Latitude: double,
        Longitude: double,
        Altitude: int,
        Timezone: double,
        DST: string,
        TZ: string
    )
)
[
	{
		"AirportID": 1,
		"Name": "Goroka",
		"City": "Goroka",
		"Country": "Papua New Guinea",
		"IATA_FAA": "GKA",
		"ICAO": "AYGA",
		"Latitude": -6.081689,
		"Longitude": 145.391881,
		"Altitude": 5282,
		"Timezone": 10,
		"DST": "U",
		"TZ": "Pacific/Port_Moresby"
	},
	{
		"AirportID": 2,
		"Name": "Madang",
		"City": "Madang",
		"Country": "Papua New Guinea",
		"IATA_FAA": "MAG",
...

We can execute more complex queries using regular SQL language features. For example:

airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");

airports_per_country() := {
    SELECT Country, COUNT(*) AS Number_Airports
    FROM airports
    GROUP BY Country
    ORDER BY Number_Airports DESC
}

Which will produce something like this:

collection(
    record(
        Country: string,
        Number_Airports: long
    )
)
[
	{
		"Country": "United States",
		"Number_Airports": 1697
	},
	{
		"Country": "Canada",
		"Number_Airports": 435
	},
	{
		"Country": "Germany",
		"Number_Airports": 321
	},
	{
		"Country": "Australia",
		"Number_Airports": 263
	},
	{
		"Country": "Russia",
		"Number_Airports": 249
	},
...

Here we see countries with the number of airports per country.

With the exception of the READ keyword, this query looks like normal SQL. The READ keyword, however, means users do not have to create tables and load data into them, or discover the schema. This is done automatically by RAW.

Passing Parameters

Now let’s add a parameter to the query and compute the number of airports for a given country.

airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");

airports_for_country(country: string) := {
    SELECT Country, COUNT(*) AS Number_Airports
    FROM airports AS a
    WHERE a.Country = country
    GROUP by a.Country
}

You can note that the function receives a parameter which is of type string and named country, this is super handy to design REST APIs. This parameter will automatically become a parameter in the URL endpoint, for example:

https://api.raw-labs.com/tutorial/airports-for-country?country=Germany

The other notable element is the “airports AS a” clause which is needed here, otherwise, trying to write the following, would not work.

// THIS DOESN'T WORK!
airports_for_country(country: string) := {
    SELECT Country, COUNT(*) AS Number_Airports
    FROM airports
    WHERE airports.Country = country
    GROUP by airports.Country
}

Here airports in the WHERE clause would refer to the airports variable (which is a collection) and airports.Country would not mean a thing comparable to the country string parameter.

The source code for this tutorial can be found here on GitHub, and more Tutorial links can be found below.

Georges Lagardere Sales Director RAW Labs

Georges Lagardère, VP Customer Experience, RAW Labs.


Want to learn more?

Learn More

Blog

Hello world, meet RAW

here we show how to use RAW Data Product Platform to create a simple Hello World output

Blog

Tutorial 2. Producing complex data with RAW

Second RAW Tutorial to show how to produce complex data, extensions to Group By, Top level collections and records

Blog

Tutorial 3. Managing complex data with RAW

Third Tutorial to show how sot read, manage and produce complex data in RAW