DataOps: Navigate the perfect storm between data agility and data control
We live in a modern business world where agility is seen as one of the key ingredients to rising to the top of the competitive pack. Notable business successes, and failures, have occurred due to this one factor. It is all very well to spot trends, plan for a changing world, but another to be able to execute on that – respond faster to new business opportunities, changes in the market, customer needs, pandemics, or geopolitical events.
Agile organisations are the ones which are best placed to adapt and out-manoeuvre their competition and as businesses become ever more data-driven, this means needing to be more “agile with data” – but there are no absolute references for what this means in practice – since “data” is way too broad a concept. For references on the web try googling: “agile data delivery”, “agile data modelling”, “agile data governance”, or “agile data science” – there will be many great articles and views.
Being agile means different things to different people, however at its core it’s the ability to create and respond quickly to change. And, in the data space, that means being able to source, produce, model, govern, process, analyse and use/act on data faster.
Many organisations mistake the need to be “agile with data” with empowerment of users to do whatever they want with data to grow the business. And as the number of tools and technologies grows to use more and more types of data more easily, it becomes easier to use, and abuse, data – make mistakes or violate company rules, policies or even break the law.
This unfettered data behaviour could be seen in SMBs, start-ups and scale-ups, and even some larger organisations – until, one day, an event occurs which causes them to think again. Here are typical scenarios:
- Company floats, or bought by a public and/or regulated company
- Company is affected by new regulations due to external change in rules
- Company has an audit where issues are found due to data processing inadequacies
- Company has a data breach
- Company enters (or wants to enter) into a new market, and that market is regulated
- Regulations and the data explosion make a perfect storm
In any of these situations (and there are more) – suddenly there are extra requirements to explain where data comes from, how it’s processed, used, by whom, for what, prove numbers are right, show that proper processes are in place, and policies are being adhered to. In short, demonstrate data control and governance. Greater data maturity. In many instances there were none of the aforementioned controls in place, and so everyone scrambles around, defending, deflecting and diverting from the main job of growing the business.
This problem is only going to get bigger. Over the last 10-15 years, we have seen two seismic shifts happen at the same time. Firstly an explosion of regulations that affect data (GDPR, CCPA, BCBS239, HIPAA, SOX, …a seemingly endless and growing list – more coming to big tech too), and, at the same time, data estates themselves have become a lot more complex:
- More data (volume, velocity) and more types of data (variety, veracity) than ever – exponentially.
- Data in more places (on-prem, public/private cloud, SaaS vendors) and in more physical devices
- Technology choice explosion in ways to store, process and access data – the so called ‘modern data stack’ – of which so many varieties exist, you can see some great examples here
- Data sourced, used, shared in many more ways and for more purposes than ever
So what’s the answer? How do you keep your agility with data, and exercise control in this complex and heavily scrutinised data world? Can you have your cake and eat it ?
Without regurgitating a DataOps definition here, adopting a DataOps approach can help with BOTH the control AND the agility concerns. DataOps seeks to provide tools, processes and structures to deal with the data explosion and the control environment. But you need to think differently about your data too. DataOps should be backed by a Data as a Product approach; a first class citizen of your business – meaning there is proper planning, management, ownership, control, feedback, measurement, audit and focus 100% on your users.
Whether you are implementing a modern data stack using a data lake, data lakehouse, data fabric, data mesh or the next incarnation in data architecture, the same basic tenets hold true. You need to manage data as a product, exercise control and governance at the same time as allowing increasing business agility. DataOps can then help deliver faster and better.
At RAW we build our product around a DataOps philosophy, Data as a Product, and Data as Code too, where the data and the code exhibit a duality, i.e. can be shared, reused, managed and consumed together. Our users create their data products as APIs with well-defined interfaces and controls.
The result is hugely accelerated delivery times, much faster iterations, complete control and audit of the whole data product delivery cycle, including sourcing, transforming, serving up bundles of data and code.
Let’s get technical
RAW takes inspiration from software development where large groups of developers work together to produce a single software artefact. And just like in software development we use tools like Git and approaches like Continuous Integration to manage RAW code.
How does this work? RAW does not read “view definitions” from its own metadata tables like SQL engines traditionally do. Instead, it reads them from Git and every Git commit transitions all APIs to the new version, atomically. All your code evolves at once; it can be tested in a separate branch against test databases/systems, and you can use whatever CI or CD methodology that best fits your needs. As a user, you push the code to a Git repo, and RAW retrieves it and updates its state, atomically. Besides being a good solution to consistency and enabling a DataOps approach to testing and deployment, it also enables new use cases, such as sharing code and APIs in GitHub repositories for example. We can finally build, test, deploy and share data products that we trust.
Jeremy Posner, VP Product & Solutions, RAW Labs.