BLOG

816,000 interesting FREE datasets with RAW Labs

March 23, 2022
   
Solution
Posted by Jeremy Posner

We like good, free data

Unless you found this page by mistake, then you’re a data person. And if you’re a data person then you will likely be interested in datasets. And we all love good free stuff too. In fact, according to my quick Google Trends analysis, the searches for ‘Free Dataset’ and ‘Free Data API’ have almost doubled in the last 12 months.

Here we will show how to easily use some great free data with RAW Labs – this example uses data in the form of time series datasets from the St. Louis Federal Reserve. They have a superb resource, called FRED, which you can read more about here or watch their video on how they evolved into fabulous resource of very broad and useful data for all types of use case. And, yes that’s not a typo – there are 816,000 datasets from 108 sources. See the main categories. It’s not all financial data, and not all US-based, so something for everyone.

FRED makes it easy to find, view and download data using their website. If searching for “unemployment”, we can quickly find many relevant datasets, and each one has a unique “series id”, e.g. https://fred.stlouisfed.org/series/LRHUTTTTEZM156S is the time series for “Unemployment in the EURO zone”.

This is fine, and we can download the data, maybe load it into excel, and combine it, aggregate it, etc. But with RAW, we can query, analyse, join and transform the FRED data from their API directly without having to download it first, which means doing things faster.

Using RAW to get FRED data

So let’s set up FRED as a resource in RAW. First thing is to get an account, because you need to use an API Key to access the FRED API. After registering with FRED, go to API Keys page, and create one. Read their T&C’s too – should also say at this point: “This product uses the FRED® API but is not endorsed or certified by the Federal Reserve Bank of St. Louis.

Once you have a key, log into our RAW App (you can get started by registering on our home page), and add the FRED API key to your secrets, which are encrypted:

Data Sources tab for your repository. Click on Secret
Name the Secret, and
copy/paste FRED API key into “value” field

To Develop code with RAW, we recommend using VSCode and have integrated RAW using an extension to make things easier. See how to get started here.

Now let’s write some simple RAW SQL code which will call the FRED API and return the results. Think of this as an API ‘shim’ – a simple connector to the FRED API, which calls it in a 1-1 fashion :

typealias observations_response := 
  record(
    realtime_start: string,
    realtime_end: string,
    observation_start: string,
    observation_end: string,
    units: string,
    output_type: int,
    file_type: string,
    order_by: string,
    sort_order: string,
    count: int,
    offset: int,
    `limit`: int,
    observations: collection(record(
        realtime_start: string,
        realtime_end: string,
        date: string,
        value: string)
        )
    );

observations ( series_id: string, api_key: string ) := {
  READ_JSON[observations_response]("https://api.stlouisfed.org/fred/series/observations", 
      http_args :=[ 
          ( "series_id", series_id),
          ( "api_key", api_key), 
          ( "file_type", "json" )
      ]
  )
};

There are two statements – firstly a typealias for the shape of the data that the FRED API returns, and secondly a function observations which calls the FRED observations endpoint using a READ_JSON statement.

Our function above takes 2 arguments, one which is the series_id, and the other which is the api_key, both required. Because we only want to process JSON here, for this example, we also set file_type. Note also we have chosen to only show a minimal set of parameters, for the sake of brevity, but you can read the full FRED API documentation here, there are many parameter options.

Queries using FRED data

Now we have set up the connector, to use this API it’s very simple, we just write a two line statement like this:

FROM `github://raw-labs/lib/1/public/stlouisfed.org/fred.rql`
    IMPORT observations;

observations(series_id:="POPTOTUSA647NWDB", api_key:= secret("fred.stlouisfed.org"))

… which will return the data in JSON:

{
	"realtime_start": "2022-03-22",
	"realtime_end": "2022-03-22",
	"observation_start": "1600-01-01",
	"observation_end": "9999-12-31",
	"units": "lin",
	"output_type": 1,
	"file_type": "json",
	"order_by": "observation_date",
	"sort_order": "asc",
	"count": 61,
	"offset": 0,
	"limit": 100000,
	"observations": [
		{
			"realtime_start": "2022-03-22",
			"realtime_end": "2022-03-22",
			"date": "1960-01-01",
			"value": "180671000"
		},
		{
			"realtime_start": "2022-03-22",
			"realtime_end": "2022-03-22",
			"date": "1961-01-01",
			"value": "183691000"
		},
		{
                 ... 

So far, so good. Now we can also do more interesting things using RAW, such as write standard SQL queries treating these results as tables. Here are two simple examples:

// return the 3 highest months
SELECT 
  date, value
FROM
  observations(series_id:="POPTOTUSA647NWDB", api_key:= secret("fred.stlouisfed.org")).observations
ORDER BY value desc
LIMIT 3

// US house price index, vs. Interest Rates, monthly
// Join two FRED datasets
SELECT
  a.date,
  a.value as house_price_index,
  b.value as interest_rate
FROM
  observations(series_id:="USSTHPI", api_key:= secret("stlouisfed.org")).observations a, // house prices
  observations(series_id:="INTDSRUSM193N", api_key:= secret("stlouisfed.org")).observations b // int. rates
WHERE a.date = b.date
ORDER BY a.date desc

It’s also easy to combine this FRED data to other sources of data too. The example below uses CO2 Emissions data from another great free data provider, OurWorldInData.Org (OWID). This other dataset lives in GitHub as a CSV file. It’s quite big, and fairly static, so we can cache it in RAW too.

We query the two datasets treating them like tables, even though one is from an API returning JSON, and the other a file on GitHub:

// US CO2 emissions vs. Global CO2 emissions, by year
SELECT 
  b.year, 
  cast(a.value as double) as US_co2_mtons, 
  b.global_co2_mtons,
  100 * cast(a.value as double) / b.global_co2_mtons as US_as_percent
FROM
observations(
  series_id:="EMISSCO2TOTVTTTOUSA", 
  api_key:= secret("stlouisfed.org")).observations a,
(
  SELECT 
    sum(co2) as global_co2_mtons, 
    year
  FROM 
    READ_CSV ("https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv", 
      has_header:=true, 
      cache:= interval "10 days")
  GROUP BY year
) b 
WHERE substring(a.date,1,4) = cast(b.year as string)

… to return data looking like this:

[
	{
		"year": 1980,
		"US_co2_mtons": 4796.440337,
		"global_co2_mtons": 76976.51700000002,
		"US_as_percent": 6.231043601258289
	},
	{
		"year": 1981,
		"US_co2_mtons": 4656.57146,
		"global_co2_mtons": 75014.31099999999,
		"US_as_percent": 6.207577458119959
	},
	{
          ...

Create an API

Finally, we can easily expose any query as an API in RAW. To do this, simply create the query above as a function, and then write a small and simple YAML file which configures this as an endpoint:

raw: 0.9
endpoint: GET
code: rql
codeFile: queries.rql
format: json
computeClass: normal
enabled: true
cacheSizeMB: 20
computeLimitSeconds: 30
declaration: us_vs_global_co2_emissions
metadata:
  title: US vs Global CO2 Emissions, by Year
  description: Data from FRED (US) and OurWorldInData (Global). Dataset from 1980
  tags:
    - emissions
    - environment
security:
  public: true
refreshSeconds: 86400

Push them both to your GitHub repo which is sync’d with RAW, and using the power of our DataOps platform the endpoint is available – go click it 🙂

https://api.raw-labs.com/examples/1/public/emissions/us-vs-global-co2

Wrap Up

We showed how easy it is to use and expose FRED API data using RAW. FRED is just one of thousands of great, free data resources on the web. With RAW, you don’t have to download the data – simply query, join with any other (internal or external) data set, then transform and make available via our Cloud-based DataOps platform. Create APIs, Share Data. Simply.

For more information, get in touch with us today, register, learn and start using!

Jeremy Posner, VP Product & Solutions, RAW Labs.


Want to learn more?

Learn More

Blog

Analysing the News with RAW

Use RAW with Web APIs to produce powerful analysis results on web pages. This example shows analysing web pages in RSS feeds, using metadata extraction and Google's language entity extraction.

Blog

DataOps: Navigate the perfect storm between data agility and data control

How DataOps is your way out of the perfect data storm - balancing data agility with the need for data control

Blog

Tutorial 1. First flight with RAW

First Tutorial with RAW where we take reader through a first query, first API and passing parameters to that API

Success

Thank you for your interest. Expect to hear from us soon.

Error

Email address must contain the @ symbol