Product Traceability: a Manufacturing Use Case

January 24, 2022
Posted by Jeremy Posner


A major European manufacturing corporation challenged us to help them with providing a product traceability API on top of their machine data. They needed an API for connectivity from user interfaces, in order to perform a search by serial number to understand the series of events, machines, processes that a particular part encountered.

The company has many factories and hundreds of machines from different vendors. Each performs a process on components, and records details, including: date/time, location, machine, serial numbers, and other metrics collected from the machine. 

Why was this a problem?

There are millions of machine-generated files, sitting in a data lake on AWS in S3 buckets, with each machine emitting files and pushing them to cloud storage.

The complexity arises due to the different file formats (multiple JSON and XML) that each machine uses. For instance, some machines are many years old, others brand new, and there can be firmware upgrades that can change the format of the data.

Their current solution was ETL-based, however due to the fluidity of the file formats and data quality issues, the data pipelines were frequently broken and had to be investigated, remediated and processing restarted including the failed backlog.

A need to perform integrated analytics alongside other enterprise data, e.g. an ERP system, or using reference values as lookup data, and an inability to perform historical data reporting.

Machine-generated data accessibility using the RAW Data Product Platform

We implemented a solution to the four major needs above, and a set of extended functionalities covering the following:

  • Accessing the data as APIs, via Business Intelligence tools, Excel and other UIs
  • Searching by serial number
  • Returning latest state of any part
  • Return historical information and changes to that part as it progressed through the manufacturing process
  • Handling changes to data structures gracefully over time
  • Finding and dealing with data quality exceptions without involving IT
  • Integration to other data sources (ERP, database) using SQL syntax for wider analytics


Our RAW Data Product Platform is a great choice for APIs to access complex, heterogenous data at scale, on machine-generated data. There are many unique features that enable these types of problems to be implemented faster, and simpler. For more details, you can download our solution brief, or white paper on our platform architecture, plus see the links below.

Jeremy Posner, VP Product & Solutions, RAW Labs.

