Tutorial 2. Producing complex data with RAW

Collections in the output field of a SELECT
Let’s start with the following query:
sales := READ("https://raw-tutorial.s3.amazonaws.com/sales.json");
all_sales() := {
SELECT * from sales
}
collection(
record(
country: string,
products: collection(
record(
category: string,
cost: int
)
)
)
)
[
{
"country": "CH",
"products": [
{
"category": "Keyboard",
"cost": 50
},
{
"category": "Keyboard",
"cost": 70
},
{
"category": "Monitor",
"cost": 450
}
]
},
{
"country": "US",
"products": [
...
products
is a collection of records, so it is like a “nested table”. Let’s analyze the following query:
sales := READ("https://raw-tutorial.s3.amazonaws.com/sales.json");
products_cost_per_country() := {
SELECT s.country, (SELECT p.cost FROM s.products AS p) AS products_cost
FROM sales AS s
}
collection(
record(
country: string,
products_cost: collection(int)
)
)
[
{
"country": "CH",
"products_cost": [
50,
70,
450
]
},
{
"country": "US",
"products_cost": [
20,
200
]
}
]
This query returns two rows: one for CH
, another for US
.
The first column is the country, and the second column is a list of the cost of the products in that country.
How does this work?
The inner SELECT
contains (SELECT p.cost FROM s.products AS p)
. We can think of this as a normal query, over a table called s.products
. The output of that query is a table with the cost of each product.
When we compose it in a single query, then the result is a table where the second column contains another table.
To further demonstrate that SELECT
s are just operations over collections of data, let’s add a WHERE filter to the inner SELECT
:
sales := READ("https://raw-tutorial.s3.amazonaws.com/sales.json");
products_cost_over_60_per_country() := {
SELECT s.country, (SELECT p.cost FROM s.products AS p WHERE p.cost > 60) AS products_cost_over_60
FROM sales AS s
}
collection(
record(
country: string,
products_cost_over_60: collection(int)
)
)
[
{
"country": "CH",
"products_cost_over_60": [
70,
450
]
},
{
"country": "US",
"products_cost_over_60": [
200
]
}
]
This query filtered the products in the inner SELECT
for those that cost > 60.
We can even do aggregations:
sales := READ("https://raw-tutorial.s3.amazonaws.com/sales.json");
number_products_cost_over_60_per_country() := {
SELECT s.country, (SELECT COUNT(*) FROM s.products AS p WHERE p.cost > 60) AS number_products_cost_over_60
FROM sales AS s
}
collection(
record(
country: string,
number_products_cost_over_60: long
)
)
[
{
"country": "CH",
"number_products_cost_over_60": 2
},
{
"country": "US",
"number_products_cost_over_60": 1
}
]
This query counts the number of products in each country that cost over 60.
Extensions to GROUP BY
We start by a traditional SQL aggregation:
airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");
airports_per_country() := {
SELECT Country, COUNT(*) AS Number_Airports
FROM airports
GROUP BY Country
ORDER BY Number_Airports DESC
}
collection(
record(
Country: string,
Number_Airports: long
)
)
[
{
"Country": "United States",
"Number_Airports": 1697
},
{
"Country": "Canada",
"Number_Airports": 435
},
{
"Country": "Germany",
"Number_Airports": 321
},
{
"Country": "Australia",
"Number_Airports": 263
},
{
"Country": "Russia",
"Number_Airports": 249
},
{
"Country": "France",
"Number_Airports": 233
},
{
"Country": "China",
"Number_Airports": 219
},
...
This query lists the number of airports per country.
In RAW SQL, however, GROUP BY
produces “groups” that can be queried.
When the GROUP BY
keyword is used, the *
is bound to the group.
To query the entire “group” for a given country – i.e. the airports in each country – we can do:
airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");
airports_per_country_detail() := {
SELECT Country, *
FROM airports
GROUP BY Country
}
collection(
record(
Country: string,
_2: collection(
record(
AirportID: int,
Name: string,
City: string,
Country: string,
IATA_FAA: string,
ICAO: string,
Latitude: double,
Longitude: double,
Altitude: int,
Timezone: double,
DST: string,
TZ: string
)
)
)
)
[
{
"Country": "Austria",
"_2": [
{
"AirportID": 1607,
"Name": "Wiener Neustadt East",
"City": "Wiener Neustadt Ost",
"Country": "Austria",
"IATA_FAA": "",
"ICAO": "LOAN",
"Latitude": 47.843333,
"Longitude": 16.260139,
"Altitude": 896,
"Timezone": 1,
"DST": "E",
"TZ": "Europe/Vienna"
},
{
"AirportID": 1608,
"Name": "Wels",
"City": "Wels",
"Country": "Austria",
"IATA_FAA": "",
"ICAO": "LOLW",
"Latitude": 48.183304,
"Longitude": 14.040861,
"Altitude": 1043,
"Timezone": 1,
"DST": "E",
"TZ": "Europe/Vienna"
},
{
"AirportID": 1609,
"Name": "Graz",
"City": "Graz",
"Country": "Austria",
"IATA_FAA": "GRZ",
"ICAO": "LOWG",
"Latitude": 46.991067,
"Longitude": 15.439628,
"Altitude": 1115,
"Timezone": 1,
"DST": "E",
"TZ": "Europe/Vienna"
},
...
The *
is a “nested table” containing all rows in the group defined by the GROUP BY
clause.
In this example the *
is all the airports in a given Country
, since the query does GROUP BY Country
.
Since *
is a table, we can query it as normally:
airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");
airports_per_country_name_city() := {
SELECT Country, (SELECT Name, City FROM *)
FROM airports
GROUP BY Country
}
collection(
record(
Country: string,
_2: collection(
record(
Name: string,
City: string
)
)
)
)
[
{
"Country": "Austria",
"_2": [
{
"Name": "Wiener Neustadt East",
"City": "Wiener Neustadt Ost"
},
{
"Name": "Wels",
"City": "Wels"
},
{
"Name": "Graz",
"City": "Graz"
},
{
"Name": "Innsbruck",
...
We can even group further:
airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");
airports_per_country_per_city() := {
SELECT Country, (SELECT City, COUNT(*) FROM * GROUP BY City)
FROM airports
GROUP BY Country
}
collection(
record(
Country: string,
_2: collection(
record(
City: string,
_2: long
)
)
)
)
[
{
"Country": "Austria",
"_2": [
{
"City": "Hohenems",
"_2": 1
},
{
"City": "Punitz-Guessing",
"_2": 1
},
{
"City": "Bad Gastein",
"_2": 1
},
{
...
This groups the airports by Country, and then by City.
The COUNT(*)
in the inner SELECT
refers to the groups created by GROUP BY City
.
Top-level Collections
Let’s look in more detail at the output of the following queries:
sales := READ("https://raw-tutorial.s3.amazonaws.com/sales.json");
sales_countries_collection() := {
SELECT country AS name FROM sales
}
collection(
record(
name: string
)
)
[
{
"name": "CH"
},
{
"name": "US"
}
]
This query returns a list of countries. Each row in the output has the column name
. We obtain a collection of two records.
Now the following query :
sales := READ("https://raw-tutorial.s3.amazonaws.com/sales.json");
sales_countries_strings() := {
SELECT country FROM sales
}
collection(string)
[
"CH",
"US"
]
… appears similar but note that AS
alias is not included.
The output is different: each row is a string. There is not record, just a collection of strings.
Top-level Records
The syntax (field1: "One", field2: 1)
is used to create a record with two fields: field1
, a string with value "One"
, and field2
, a integer with value 1.
Collections and Records can be nested in RAW, so the following is a valid query:
airports := READ("https://raw-tutorial.s3.amazonaws.com/airports.csv");
number_of_airports_countries() := {
(
Number_Of_Airports: (SELECT COUNT(*) FROM airports),
Countries: (SELECT DISTINCT Country FROM airports)
)
}
record(
Number_Of_Airports: long,
Countries: collection(string)
)
{
"Number_Of_Airports": 8107,
"Countries": [
"Austria",
"Azerbaijan",
"Belgium",
"Bermuda",
"Bhutan",
"British Indian Ocean Territory",
"Burundi",
"Cameroon",
"Cayman Islands",
"Chile",
"Congo (Kinshasa)",
"Cote d'Ivoire",
"Dominica",
"Dominican Republic",
"Faroe Islands",
"Finland",
"French Guiana",
"Grenada",
"Guernsey",
...
The output of this query is a record with two fields: Number_of_Airports
, a long and Countries
, a collection of strings.
The source code for this tutorial can be found here on GitHub, and more Tutorial links can be found below.

Georges Lagardère, VP Customer Experience, RAW Labs.
- Give RAW a try: Get Started for free!
- Why not follow us on LinkedIn, or Twitter, or join the conversation over at Reddit
- Read our Tutorials and Getting Started docs
- Like code? head on over to GitHub and look at our demo APIs
- Developer? Join us! we are looking for bright minds – at all levels of seniority, in databases, distributed systems, UI/UX.
Learn More

Tutorial 1. First flight with RAW
First Tutorial with RAW where we take reader through a first query, first API and passing parameters to that API

Tutorial 3. Managing complex data with RAW
Third Tutorial to show how sot read, manage and produce complex data in RAW

Hello world, meet RAW
here we show how to use RAW Data Product Platform to create a simple Hello World output