BLOG

Resource-Intensive Data Products: “Calculation-as-a-Service”

January 19, 2022
   
Solution
Posted by Jeremy Posner

Introduction & Recap

Some Data Products are simple, but there’s plenty that are more complex, involving calculations, models and other longer-running and resource intensive workloads. Here is an example.

In my previous article I showed how RAW can be used to serve up a Value at Risk (VaR) calculation using a historical simulation method. Our RAW platform can also be used to serve this capability up as an API based Calculation-as-a-Service capability which can be consumed by any programme, BI tool, or even an Excel spreadsheet.

API Endpoints

The following endpoints are calculating VaR on an index representing a portfolio of stocks, with 5 years of prices and at a 99% confidence level.

(Note, these may take a few seconds to run, as there’s a lot of data behind to crunch, and results may not be cached depending on when you run this. If you get a time-out message “Please Try later” it’s just a http 202 message, wait a few seconds then refresh as the query is running in the background)

Dow Jones 30 stocks
https://api.raw-labs.com/demos/1/public/value-at-risk/index-var?index=^DJI


Nasdaq 100 stocks
https://api.raw-labs.com/demos/1/public/value-at-risk/index-var?index=^IXIC


S&P 500 stocks
https://api.raw-labs.com/demos/1/public/value-at-risk/index-var?index=^GSPC


Multiple parameters are supported... index, confidence and horizon 
https://api.raw-labs.com/demos/1/public/value-at-risk/index-var?index=^IXIC&confidence=95

RAW API = SQL + YAML + GIT

With RAW, API endpoints are just SQL + YAML + GIT. If you know these three, then it’s super simple:

  1. Create a query using our superset of SQL (see docs). RAW has added functional additions and the ability to handle complex data, natively, in queries & data structures.
  2. Configure the query execution using a YAML file. Set the computation class (normal, large, extra-large), security, metadata, cache and other items simply.
  3. Commit to your GitHub repo. And that’s it…. Execute on our secure and scalable PaaS infrastructure.

For first principles and mechanics around this you can check out our Hello world, meet RAW post, and, as usual everything is on GitHub here.

Query

The query magic is in the code file: var.rql – where the endpoint executes the ‘index_var’ function below, passing the arguments through, I have added some sensible defaults for demo purposes too.

index_var(index: string := "^DJI", confidence: double := 99.0, 
          horizon: interval := interval "5 years" ) :=
{
    SELECT 
      Sector, 
      historical_var((SELECT Symbol as symbol, Weight * 10000 as assets 
                      FROM PARTITION), confidence, horizon ) as VaR
    FROM 
      READ("https://www.dropbox.com/s/0f84bajm36nqfla/index_weightings.csv?dl=1", 
           cache := interval "5 days")
    WHERE Index = index
    GROUP BY Sector
    ORDER BY VaR DESC
}

The query above partitions the stocks in the index by their sector, and calls the ‘historical_var’ function for each partition:

The Index weightings (for Nasdaq, Dow Jones and S&P 500) live in a dropbox file for this example, but this could be a database, or even another API, as they are all directly queryable in RAW.

This function allows you to specify an index, but you could modify this endpoint to pass in a list of stocks and their asset-weighting within the portfolio.

The historical_var calculation itself is where VaR is calculated for each stock. It takes in a collection of records, each with a symbol (e.g. “JPM”) and an amount of that asset in the portfolio (e.g. 500 shares, or for an index it might be 0.34 representing the weight of that stock in the index).

historical_var(
    portfolio: collection(record(symbol: string, assets: double)), 
    confidence: double := 99.0,                // some default, 99% confidence
    timespan: interval := interval "1 year"    // some default, 1 years history
)
... (see the rest on github)

For the closing prices, we use a function that returns close price history for a collection of stocks and a timespan ( 5 years, for instance). Our data comes from a file on dropbox but this can easily be changed to use your own market data source.

close_price_history(symbols: collection(string), timespan: interval) := {

SELECT 
  symbol, 
  TO_DATE(date, "yyyy-MM-dd") AS Date,
  close, 
  change, 
  changePercent
FROM 
  READ("https://www.dropbox.com/s/cy6dymeypfrgijb/5yr_us_stocks.csv?dl=1", 
      cache := interval "5 days")
WHERE 
  symbol in symbols and 
  TO_DATE(date, "yyyy-MM-dd") >= current_date() - timespan
  ORDER BY TO_DATE(date, "yyyy-MM-dd") ASC
}

Config

With RAW, the endpoint is determined by the YAML file. In this case the file contains simple statements:

raw: 0.9
endpoint: GET
metadata:
  title: Index VAR calculation
  description: Returns the VaR (Value at Risk) for a portfolio of assets described by an index (either ^DJI, ^GSPC or ^IXIC currently) representing 1 million shares by index weight, using historical simulation method
  tags:
    - finance
    - VaR
code: rql
codeFile: var.rql
declaration: index_var
format: json
security:
  public: true
computeClass: normal
enabled: true
refreshSeconds: 86400

Seeing as this could be a fairly large calculation, we might want to set the computeClass to a large cluster for this endpoint, vs. a normal cluster for APIs that operate on smaller datasets. This feature to allocate the right resources to each endpoint can make the best use of resources and potentially save money too.

For the purposes of this demo, I have set the cache refreshSeconds to 1 day (86400 seconds)

With RAW, each query execution will be paired with a relevant resource group to match appropriately to its needs. We have price plan options including a free tier, pro, business, and enterprise options.

RAW Catalog

Logon to RAW via https://app.raw-labs.com/, once authenticated click on ‘Catalog’. In the “demos” tree (left pane) you will see the endpoint structure mirrors the GitHub structure. Navigate down to the ‘index-var’ endpoint. Clicking on the endpoint will reveal the catalog entry for this endpoint, using the YAML code above and the information in the RQL query file:

RAW Catalog showing endpoints on the left, with focus on our Index VAR Calculation

Wrap Up

Here we show how RAW can create a Calculation-as-a-Service API, a resource-intensive Data Product, using just a SQL + YAML + GIT approach. Our platform takes care of the infrastructure and provides execution capabilities to match with the workload in hand.

Jeremy Posner, VP Product & Solutions, RAW Labs.


Want to learn more?

Learn More

WHITE PAPERS

RAW Data Product Platform

Read more about our RAW Data Product Platform, including challenges it is designed to solve, and components of the solution

Blog

Hello world, meet RAW

here we show how to use RAW Data Product Platform to create a simple Hello World output

Blog

Tutorial 1. First flight with RAW

First Tutorial with RAW where we take reader through a first query, first API and passing parameters to that API

Success

Thank you for your interest. Expect to hear from us soon.

Error

Email address must contain the @ symbol