BLOG

Powerful new SEO & SEM Data Products with RAW

August 2, 2022
   
Solution
Posted by Jeremy Posner

Intro

We can use RAW for all types of analysis as we’ve seen from previous blogs. Here’s an example of how we can create powerful new data products using SEO/SEM data.

In this example we will show querying data from dataforseo.com API – they have many APIs for different purposes, but we will be using the Google Historical Search Volume API so that we can view two brands and see how searches for them compare on a monthly basis.

We will firstly show how to create a singular core data product, which is essentially the data returned from dataforseo.com API. Then we will go on to create a second data product which builds on the first, and allows comparative analysis of brands/keywords over a time-series.

Of course you can then combine this data with any other SEO/SEM data or logs – we will get to this in later blogs. As with all RAW data products, the data is queried live, up to the minute. No batches. No waiting. No ETL. No staging areas. Caching is taken care of too. It’s easy and fast to get going, so try RAW today.

Let’s go….


First Up – DataForSEO core API product

Below we have set up a simple structure which will mimic the data from their API. There’s two things here, firstly a typealias which is the structure of the API returned output:

// typealias: the structure of the data returned by DataForSEO's API

typealias hist_search_vol_result :=
 record(
  version: string,
  status_code: int,
  status_message: string,
  time: string,
  cost: double,
  tasks_count: int,
  tasks_error: int,
  tasks: collection(record(
      id: string,
      status_code: int,
      status_message: string,
      time: string,
      cost: double,
      result_count: int,
      path: collection(string),
      data: record(
        api: string,
        function: string,
        se_type: string,
        keywords: collection(string),
        language_name: string nullable,
        location_code: int nullable,
        include_serp_info: bool),
      result: collection(record(
          se_type: string,
          location_code: int,
          language_code: string,
          items_count: int,
          items: collection(record(
              se_type: string,
              keyword: string,
              location_code: int,
              language_code: string,
              search_partners: bool,
              keyword_info: record(
                se_type: string,
                last_updated_time: string,
                competition: double nullable,
                cpc: double nullable,
                search_volume: int nullable,
                categories: collection(int) nullable,
                monthly_searches: collection(record(
                    year: int,
                    month: int,
                    search_volume: int))),
              keyword_properties: record(
                se_type: string,
                core_keyword: string nullable,
                keyword_difficulty: int nullable),
              impressions_info: record(
                se_type: string,
                last_updated_time: string,
                bid: int nullable,
                match_type: string nullable,
                ad_position_min: double nullable,
                ad_position_max: int nullable,
                ad_position_average: double nullable,
                cpc_min: double nullable,
                cpc_max: double nullable,
                cpc_average: double nullable,
                daily_impressions_min: double nullable,
                daily_impressions_max: double nullable,
                daily_impressions_average: double nullable,
                daily_clicks_min: double nullable,
                daily_clicks_max: double nullable,
                daily_clicks_average: double nullable,
                daily_cost_min: double nullable,
                daily_cost_max: double nullable,
                daily_cost_average: double nullable),
              serp_info: record(
                se_type: string,
                check_url: string,
                serp_item_types: collection(string),
                se_results_count: double nullable,
                last_updated_time: string,
                previous_updated_time: string
              ) nullable
            )
          )
        ) nullable
      ) nullable
    )
  )
 );

This is followed by a simple function that uses the typealias in a HTTP POST method. We have to create the JSON to submit in the body, based on parameters to the function:

// Historical Search Volume function

historical_search_volume(
    key: string, 
    keywords: string, 
    language_name: string null := null,
    language_code: string null := null,
    location_name: string null := null,
    location_code: int null := null,
    tag: string null := null,
    include_serp_info: bool:=false
  ):= {

    // form the body to post
    body:=print_json(
      (
        keywords: split(keywords,","), 
        language_name: language_name, 
        language_code: language_code, 
        location_name: location_name, 
        location_code: location_code, 
        include_serp_info: include_serp_info,
        tag: tag
      )
    );
    
    READ_JSON[hist_search_vol_result](
      "https://api.dataforseo.com/v3/dataforseo_labs/google/historical_search_volume/live", 
      http_method := "post", 
      http_headers := 
        [
          ("Authorization", "Basic " + key ),
          ("x-raw-output-format", "json"),
          ("Content-Type", "application/json")
        ], 
      http_body_string :="[" + body + "]"
    )
};

Now we have our library function (you can find this and other library functions in GitHub) – next up is to create a simple data product (API) that uses it.

The following short piece of code will import the library code, and call the library function. It makes use of our secrets system to store the API key safely too:

FROM `github://raw-labs/lib/1/public/dataforseo.com/query.rql`
    IMPORT historical_search_volume;

search_vol_hist(
    keywords: string, 
    location_name: string nullable := null,
    location_code: int nullable := null,
    language_name: string nullable := null,
    language_code: string nullable := null,
    include_serp_info: bool nullable :=false
) := {

  // pick up the key, and pass through to the dataforseo shim
  historical_search_volume( 
    key:=secret("dataforseo.com"), 
    keywords:=keywords,
    location_name:=location_name,
    location_code:=location_code,
    language_name:=language_name,
    language_code:=language_code,
    include_serp_info:=include_serp_info
  )

}

You can find this and other example data products in our GitHub demos

Here’s the YAML file that we configure to call the above query, which will essentially create the API. It also contains tests too. Notice how we have taken a POST HTTP request and turned it into a GET request also:

raw: 0.9
endpoint: GET
code: rql
codeFile: seo.rql
format: json
computeClass: normal
enabled: true
cacheSizeMB: 10
computeLimitSeconds: 10
declaration: search_vol_hist
metadata:
  title: Historical Search Volume for Google Searches
  description: Endpoint returns Historical Searches for Keywords on Google, by Month.
  tags:
    - seo
    - dataforseo
    - google search
security:
  public: true
refreshSeconds: 3600
tests:
  - name: test1
    description: "simple test, one keyword and language code supplied"
    arguments: 
      - key: keywords
        value: "tesla"
      - key: language_code
        value: "en"

This file can be found here on GitHub

Using our new SEO data product

Finally, you can query the API using this URL:

https://api.raw-labs.com/examples/1/public/SEO/historical-search-vol?keywords=tesla&language_code=en&location_name=United%20States

… and of course replace the parameters with your own. Note that this API returns JSON – if your browser isn’t rendering properly then there are some nice extensions to format (chrome one here)


Creating our new data product for Comparative Historical Brand Searches

Now let’s take that data product we created above, and reuse it to create another. We could also combine it with data from elsewhere, be it an API from Google (Ads, Search Console), WordPress, or any other useful source of SEO or Log data from your own website (more of those blogs later) – but, to keep it simple let’s just reuse the API above to create a second one that compares the historical search volumes over time for a pair of keywords, they might be brands, for instance ‘coca-cola’ and ‘pepsi’.

The code below will reuse the same typealias, and query the API above passing in parameters to retrieve a result. You can read the code below, with more commentary after:

FROM `github://raw-labs/lib/1/public/dataforseo.com/query.rql`
  IMPORT typealias hist_search_vol_result;


historical_search_volume_compare( keyword1: string, keyword2: string, location_name: string, language_code: string nullable := "en") := {

  // uses an existing endpoint
  // takes two time-series of search volumes
  // compares them month by month

  items:=
    SELECT 
      cfirst(cfirst(r.tasks).result).items
    FROM
      READ_JSON[hist_search_vol_result]("https://api.raw-labs.com/examples/1/public/SEO/historical-search-vol",
      http_args := [
        ("keywords", mkstring([keyword1,keyword2],",") ), 
        ("location_name", location_name),
        ("language_code", language_code)
      ] ) as r;


  res1:=
    SELECT i.keyword_info.monthly_searches
    FROM cfirst(items) as i
    WHERE i.keyword = keyword1;

  res2:=
    SELECT i.keyword_info.monthly_searches
    FROM cfirst(items) as i
    WHERE i.keyword = keyword2;


  SELECT 
    CAST (m1.year*100 + m1.month as string) as month,
    m1.search_volume + m2.search_volume as total_search_vol, 
    m1.search_volume as keyword1_search_vol,
    m2.search_volume as keyword2_search_vol,
    CAST(m1.search_volume as float) / 
      CAST(m2.search_volume as float) as keyword1_vs_keyword2_search_vol
  FROM 
    cfirst(res1) as m1, 
    cfirst(res2) as m2
  WHERE 
    m1.year = m2.year AND 
    m1.month = m2.month

};  

The result from the dataforSEO.com API is a complex and deeply nested JSON data structure. The results navigate down that structure using CFIRST() statements (which take the first value of a collection) to extract the monthly searches for each keyword. Lastly we perform simple SQL statement to join the two parts of the results set together by month, and return a new, simpler data structure.

You can also see that RAW is a composable language, meaning the data and function are interchangeable. We are querying a function, CFIRST() that returns a collection like a table. This makes it very powerful to build data products from!

Creating the endpoint

Last up is to configure a YAML file to create the endpoint for this code. We can also add some tests here too, see my other blog on testing:

raw: 0.9
endpoint: GET
code: rql
codeFile: historical-search-vol-compare.rql
format: json
computeClass: normal
enabled: true
cacheSizeMB: 10
computeLimitSeconds: 10
declaration: historical_search_volume_compare
metadata:
  title: Historical Search Volume Comparison for Google Searches on two keywords
  description: Endpoint returns Historical Comparison by Month for Searches Google for a pair of keywords
  tags:
    - seo
    - dataforseo
    - google search
security:
  public: true
refreshSeconds: 3600
tests:
  - name: test1
    description: "simple test, two keywords and location name"
    arguments: 
      - key: keyword1
        value: "tesla"
      - key: keyword2
        value: "rivian"
      - key: location_name
        value: "United States"

Endpoint for the new data product

To get our new endpoint, we simply commit both of these files to GitHub, and our endpoint is ready.

You can now see this endpoint in our public demos catalogue. Here you will see all the metadata, and code for the endpoint. You can copy the invocation, and even download a spreadsheet with the API invocation embedded. The Catalogue is also fully searchable, and there are many other examples to try out too.

If you would rather just click in your browser, feel free to try these, or use your own parameters:

PEUGEOT vs RENAULT, in FRANCE:

https://api.raw-labs.com/examples/1/public/SEO/historical-search-vol-compare?keyword1=peugeot&keyword2=renault&location_name=France&language_code=fr

TESLA vs RIVIAN, in USA:

https://api.raw-labs.com/examples/1/public/SEO/historical-search-vol-compare?keyword1=tesla&keyword2=rivian&location_name=United%20States


Wrap Up

That’s it for this blog post – Hopefully you can see that it’s easy to use RAW to create data products for SEO using existing APIs and data, also combining them and creating value-added ones yourself is trivial. Then share them with colleagues, customers using our secure API key mechanism.

We are building the fastest API creation and data sharing solution. For more information, get in touch with us today, or simply register, and start using!

Jeremy Posner, VP Product & Solutions, RAW Labs.


Want to learn more?

Learn More

Blog

12 Benefits of Data Sharing via APIs

Data Sharing via APIs is a modern approach to unlocking valuable resources inside your organisation. Businesses sharing data are more likely to outperform those who do not. Read more about why using APIs is a better approach.

VIDEOS

Product Walkthrough

Walkthrough of Major Features, using a worked example to show both Data Consumer and Data Producer perspectives. Code accompanying this demo can be found on GitHub

Blog

Defining and Running Tests for API endpoints

We show how easy it is to create an API test suite using just YAML for your RAW APIs inside VS Code using our RAW Extension: Define tests, execute them and fix your code.