No credit card required
DJ Patil defined a data product as “a product that facilitates an end goal through the use of data”. But what are the many characteristics of a good Data Product? It’s a new space and we can learn a lot from other industries with very mature products.
Choose key characteristics that are important to you, and then quantitatively measure different options for delivering them. Here’s 4 characteristics that are important to our customers, and why they chose an API approach. For them, a Data Product should be:
- Specified
- Usable
- Fit-for-purpose
- Supported
… as deemed by the intended audience, i.e. those who the value proposition appeals to.
Let’s look at these in a little more depth, and see where APIs satisfy these well.
A Specified Data Product
An API can reflect your business, and most businesses are complex. Data structures are often complex too. A good API supports and reflects the business, can support complex structures well – often difficult in tables and columns.
Specification is not just about data structure, it’s about the way things are named and APIs allow you to use consistent nomenclature, no matter what the database says it’s called. There are specification standards for APIs – these allow interoperability and hence this gives you tooling options, so you’re no longer tied down to that old mainframe, or even that new cloud provider.
API docs have standards and are easy to generate along with the API. Documentation is critical to being able to specify correctly, and a good API has lots of great docs, those that show how to use it, with nice examples, test pages, etc. for a smooth onboarding process.
An API is a data contract plus a lot more. It communicates with people. It reduces ambiguity. It isolates changes in the underlying data stack, so you can keep evolving in a decentralised (and data mesh friendly) world.
A Usable Data Product
Usability of any product is key. Just ask Apple. When your product is data, ask yourself if it’s optimised for use by your users. Better still, ask them: Are they Excel-junkies? Data Scientists? BI tool users? Can they write SQL?
An API can serve data in different formats: JSON is common, but sometimes Tabular data, a CSV is easier, and other times it won’t work well, e.g. for large datasets you might want to return compressed data. API’s can be read directly by many tools, from Excel, to Tableau and everything in between.
Still using messy file transfers? APIs give users the data on demand and not when dropped into folders, and reduce operational burden along the way.
Whilst we are onto usability, it is worth noting that both users and computers can consume APIs, so you don’t always need two different mechanisms – having one API, with two different outputs may work well.
Lastly, API docs, if they are human and machine-readable, can be shared just like any other metadata to be made searchable by your users in an API Catalogue.
A Fit-for-Purpose Data Product
What the APIs purpose is, and the intended audience should be stated by the API Product Owner. Is the API intended for developers? is it for hobby-coders? or just for machines? One of our clients has users who are not sophisticated, so the API is simple. Another where they are professional developers, and so the API adds many more features.
Either way, as API’s are contracts, a good API has well-specified inputs and outputs, and therefore it is testable, with many mature software products that can integrate into your DevOps processes. You can test against APIs to ensure your build isn’t broken. That works nicely in a Data Mesh environment where multiple groups publish their APIs for each other.
The other side to being data contracts is that the quality of the data can be measured against the specification, via enhanced testing harnesses – not just testing structure, return codes, but also testing values. For instance testing that an ISO country code has a correct value which can be performed via another API.
A Supported Data Product
Like any good product, a Data Product is only as effective as its support. A key job of the API Product Owner is to ensure this supportability. Automate API documentation with the delivery of the API itself.
APIs can support versioning; there are a number of well-trodden paths for version management, including versioning in the path, in the header. It is often the case that multiple versions of an API are supported at one time. An API can announce itself as ‘deprecated’, and this allows graceful end of life support as a new version rolls in.
You can measure API consumption, just look at the logs – can see who is still on V1 when everyone is on V2. Furthermore it is possible to see what parameters, features or scope being used and also what’s not being used. Not so easy in a file. Also hard in most databases.
Finally, because APIs are database-independent, you are free to move your underlying data stack, perhaps it’s out of support, you move Cloud vendor, find something cheaper or a better set of technologies to deliver your API. We know one and you can get started here! 🙂