BLOG

RSS News Feed Filtering with RAW

April 28, 2022
   
Solution
Posted by Georges Lagardère

RSS still feeds the news

We read here and there that RSS is dead – launched 20 years ago but it’s still around. The New York Times, Wikipedia or CNN are still pushing news over their RSS feeds.

RSS is also used for other topics like Food Safety Information; the US government provides a mashup of different sources into one single feed for product recalls and other alerts. This page has a RSS Feed Link

In this post we will show how easy it is to search for terms over this feed. If you click the link above, you will see a RSS format like the following :

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Food Safety</title>
    <description>An RSS feed of combined Food Safety information.</description>
    <link>https://www.cdc.gov/foodsafety</link>
    <atom:link href="https://www2c.cdc.gov/podcasts/createrss.asp?c=146" rel="self" type="application/rss+xml" />
    <image>
      <title></title>
      <url></url>
      <link>https://www.cdc.gov/foodsafety</link>
      <width></width>
      <height></height>
    </image>
    <language>en-US</language>
    <webMaster>podcasts@cdc.gov</webMaster>
    <category>Health Marketing</category>
    <item>
      <title>FSIS Issues Public Health Alert for Ground Beef Products  Due to Possible E. coli O26 Contamination</title>
      <description>WASHINGTON April 27, 2022, - The U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS) is issuing a public health alert due to concerns that specific ground beef products...
</description>
      <link>https://tools.cdc.gov/podcasts/download.asp?m=316422&amp;c=729049</link>
      <guid isPermaLink="true">https://www.fsis.usda.gov/recalls-alerts/fsis-issues-public-health-alert-ground-beef-products-due-possible-e-coli-o26</guid>
      <pubDate>Wed, 27 Apr 2022 12:00:00 EST</pubDate>
      <category>Food Safety</category>
    </item>

...

Using RAW to process news data

RAW can read and process XML, just like JSON, and has support for relational and log file structures too. You can read more about how we handle structures in our docs.

From a RAW perspective, this structure above is composed of an item, which is a collection, and within lies a channel record – so the data structure looks like this:

`item`: collection(
                    record( `title`: string,
                            `description`: string,
                            `link`: string,
                            `guid`: record( `@isPermaLink`: bool,
                                            `#text`: collection(string) ),
                            `pubDate`: string,
                            `category`: string ) 
                        ) 
                ) 

We are interested in filtering the title to search for certain terms which could be brands or product names for instance. This can be achieved via a RAW Query function below called feed, which extracts the title, publication date and link for any title containing the search term passed to the function :

feed(search:string) := {
  SELECT 
    title, pubDate, link 
  FROM 
    READ("https://www2c.cdc.gov/podcasts/createrss.asp?c=146").channel.item 
  WHERE LOWER(title) LIKE CONCAT("%",LOWER(search),"%")
}

In this RSS structure, remember that channel is a record and item is a collection of records, so by writing channel.item in the READ() the SELECT statement is operating over that collection as if it were a table in standard SQL.

Calling feed("Ferrero") at the time of writing this post now returns :

[
	{
	"title": "Ferrero Voluntarily Recalls Kinder® Happy Moments Chocolate Assortment and Kinder® Mix Chocolate Treats Basket Because of Possible Health Risk and Advises Consumers to Dispose of Certain Kinder Products Not Intended for U.S. Distribution Due to Recall of Products Made in Belgium",
	"pubDate": "Tue, 12 Apr 2022 00:00:00 EST",
	"link": "https://tools.cdc.gov/podcasts/download.asp?m=316422&c=719848"
	}
]

Creating an API

As you can see, it is simple to create a function to filter data from an RSS feed.

To turn this into a reusable API endpoint, you can follow this development guide. It’s very simple and will take a few minutes. Once the endpoint available you can call it from any application to monitor this RSS feed for specific topics.

At RAW Labs, we are currently building a feature to make this even easier to use, including simple scheduling of jobs: for instance it will be possible to call the endpoint at regular times, e.g. every day to check for alerts or updates.

In the event of an RSS returning a result, it will be possible to send an email, a text message or a Slack notification by the use of another API call. This will be shown in another post as soon as the feature is released, stay tuned !

For more information, get in touch with us today, register, learn and start using!

Georges Lagardere Sales Director RAW Labs

Georges Lagardère, VP Customer Experience, RAW Labs.


Want to learn more?

Learn More

Blog

Tutorial 1. First flight with RAW

First Tutorial with RAW where we take reader through a first query, first API and passing parameters to that API

Blog

Analysing the News with RAW

Use RAW with Web APIs to produce powerful analysis results on web pages. This example shows analysing web pages in RSS feeds, using metadata extraction and Google's language entity extraction.

Blog

816,000 interesting FREE datasets with RAW Labs

Free Datasets are a great resource - see an example showing how to use RAW to query, transform and join with other data, then expose API endpoints for further data sharing.

Success

Thank you for your interest. Expect to hear from us soon.

Error

Email address must contain the @ symbol