Avoid the Adventurist Trap: Don’t Take Data Quality for Granted
This is the first part of a blog post series where we’re going to talk about data quality management and the problems that come with taking data quality for granted. We’ll explore the all too common “set it and forget it” culture that pervades our industry, why this mindset is so common in digital analytics, and how it can be avoided by evolving your data-driven culture and data quality assurance processes — hopefully winning some hearts and minds along the way.
Why is the “Set It and Forget It” Culture So Common?
The answer to this question is remarkably simple, however, the solution is a complex one that challenges human behavior at its very core. Yes, I just related data quality management to the foundations of human behavior, so you’d better strap yourself in.
The answer is that repairing, maintaining, and cleaning something is not an adventure. It’s just work, and most of us in this industry are adventurers and innovators, not the sort of people who are happy with repetitive and mundane tasks.
Anyone who has been in our industry for a little while will know that we possess some of the greatest and most diverse talent in business today, ranging from highly qualified and precise scientists, to creative and logical developers, to inspiring and strategic data storytellers. We’re constantly taking voyages into data to discover new insights, conducting rigorous scientific research to prove our theories, and constructing stories and strategies from data that are no less than pieces of art. We’re a community of scientists, artists, and creators, so the last thing we’re interested in is repairing, maintaining, and cleaning our systems to preserve data integrity, because it’s boring and uninspiring.
The Real Solution
The real solution is not new, but it is complex: data governance automation.
We’re so close to the bleeding edge of technology in our industry that these days we throw around terms like machine learning and artificial intelligence as buzz words, when five years ago they were still parts of science fiction. So we know all too well in the backs of our minds that automation of data governance is possible and could relieve us of this mundane task.
But we can’t simply leapfrog from our current situation to industry-wide automation; we need to go through a period of adoption where we continue to develop our data-driven culture and embed our data analytics tools into our organizations.
So, Where Do We Begin?
Data quality begins not at the data collection stage but at the very first stage of the discovery process, where we pose or clarify a question that we want to answer because, without knowing what question we want to answer, how do we know what data to collect?
Data quality begins not at the data collection stage but at the very first stage of the discovery process… Click & Tweet!
Data governance begins here because, if we collect the wrong data, it doesn’t even matter if it’s high quality — its value is greatly diminished, maybe even worthless, and will likely mislead us.
You may already be familiar with this stage of the process, where we develop a Solution Design Reference (SDR) and decide what data is required to answer the key business questions. I’d argue that the “set it and forget it” culture is the most dangerous at this stage of the process because it underpins what data is collected and, therefore, what interpretations can be drawn. Crucially, the questions you were trying to answer when you first developed your solution design reference are likely not the same as the questions you’re trying to answer today, so why are you still collecting the same data?
You should do regular reviews of your SDR, pruning data that is obsolete and cultivating new data to answer new business questions. You should regularly harvest your insights, enrich the soil of your data collection, plant new seeds, and nurture and tend to them until they flower into new stories that answer the burning questions about your customers.
However, most in our industry have separation anxiety with data and are afraid to stop collecting a piece of data for fear of breaking the trend or needing it again in future. This is dangerous and hinders the progress of your analytics program. If you can’t find a reason to keep a piece of data beyond this, that is, if the data is no longer useful in answering a business question, then get rid of it. Otherwise, you’re simply clogging up your well-oiled machine with unnecessary parts that require maintenance and end up costing you money and time.
Your carefully cultivated harvest will also wither and die over time if left unattended, so it’s really important to conduct regular data quality audits.
What’s your reaction when you read or hear the words “regular data quality audits?” Excitement? Inspiration? A new adventure and new insights to discover? Are you going to boldly go where no analyst has gone before?
What’s your reaction when you read or hear the words “regular data quality audits?” Excitement? Inspiration? A new adventure and new insights to discover? Are you going to boldly go where no analyst has gone before? Click & Tweet!
Give Your Data a Regular Service
Your reaction is the same sort of reaction you have when servicing your car, cleaning your house, washing yourself, and brushing your teeth, but these are necessities that avoid the buildup of dirt, and they have a magical effect that you know deep down: extending lifespan.
Whether this is the lifespan of your car due to maintenance or teeth through cleaning them, extending lifespan is extremely important. However, we’re conditioned as a society to simply discard old things and buy something new. These days, we can even buy new teeth and discard the ones gifted to us by nature.
Data is no different and, at Blast, we regularly work with customers who want to rip out their current implementation and start again because it’s too expensive to fix what they have already. While ripping it out and starting again is often unavoidable when things have gone too far and an implementation has been collecting poor quality data for so long, if regular data quality audits had been performed from the start, then maintenance would have occurred and the lifespan of their data would’ve been massively extended.
So Why Do We React This way to Regular Data Quality Audits in Particular?
The top reason is that they’re manual, in fact, extremely manual! Even with the current automated technology available, it takes time to configure it and review the results. Sure, you can configure rules to alert you automatically when issues occur, but those rules themselves need maintenance and need to be kept up to date with the evolution of your data collection strategy, and now we’re maintaining something that’s designed to help us maintain something else.
It’s mundane and, if given the choice, I bet you don’t want to do it. You’d rather pay someone else to do it. Or, better yet, automate the process, because trawling through hundreds of pages on your website and comparing data against your SDR is boring and takes a long time. Even describing it, you understand fundamentally that this is a task designed for a machine.
There’s no denying it’s also difficult, and that’s why the task often falls on the implementation specialist within your organization because they have the skills needed to do that job. It’s also why human error occurs so often during this process because mistakes are easier to make when performing a difficult task.
All of this chalks up to high effort and cost with no clear value to the business, and that last point is crucial.
There’s No Clear Value to the Business
If you’ve read this far, you now understand the value of maintaining something. However, I’m challenging a social paradigm that not only pervades our industry but also our current existence on this planet, and it undermines our wasteful consumer culture.
The value of maintenance should be immediately clear to the business: extending the lifespan of something reduces costs and increases profits, provided that the cost of maintenance doesn’t exceed the cost of a new implementation plus the hidden cost of discarding your existing implementation.
And there it is: the cost of maintenance is higher.
The Cost of Maintenance
The cost of maintenance is only higher than the value of maintenance because, as humans, we find it difficult to quantify the value. We’re experts at quantifying cost and minimizing cost to increase profits, but so often we find it hard to prove the value of something. For example, it’s easy for us to collect data on the number of hours someone spent on performing data quality audits and translate this into a financial cost based on that employee’s salary.
However, it’s difficult for us to collect data on the value of the data quality audit because value is often the avoidance of cost. For example, the audit may have prevented a data quality issue that may have cost many hours to fix. Since this didn’t actually occur, the hours were never spent fixing it and so that cost, which is actually value in disguise, cannot be attributed to the audit.
Furthermore, the implementation specialist gets value out of performing the audit because they’ll obtain a deeper understanding of the implementation and the data which they did not have before, and this may lead them to new insights or performing an upgrade on the implementation to answer a new business question.
These problems arise in part because organizations lack a culture of data literacy and data governance. The net result is that, even if you present this data in a meaningful way, it may not make sense to others in your business.
When you don’t have a culture of data literacy and data governance, your organization won’t care about data until it’s either too late or until bad decisions have already been made. Then the cost of cleaning up the mess is much higher and easily quantifiable, at which point it’s cheaper to rip it out and start again. No one talks about the value that is being discarded.
When you don’t have a culture of data literacy and data governance, your organization won’t care about data until it’s either too late or until bad decisions have already been made. Click & Tweet!
In the next parts of this series on data quality, we’re going to explore the cost of the “set it and forget it” culture. Stay tuned.