How much Data Quality is good enough?

Michelle Knight

How much Data Quality is good enough?

How much Data Quality is good enough?

As a software tester, I have seen businesses grip matching quality with budgeted time and money. Of course, everyone wants 100% but can every possible data input be considered and every business contingency is predicted? A developer friend of mine can see managers, other developers, and business analysts get stuck in the trees of Data Quality, focusing on minutia.

Based on experience, requirements, as guided by Data Governance, reproducibility and risk coverage seem to be fundamentals gaging good enough Data Quality to minimize data challenge and give a green light to Data Quality.

What do you think?

William McKnight

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

This is an interesting question that I field often. This may seem like a non-answer, but I say to bring data quality to a tolerable standard as defined by Data Governance representing business need. Data quality has a cost and a point of diminishing returns so the technical team needs to be in communication with Data Governance offering what can be done and what levels data quality can be brought to. The standard can be set by scoring the data quality, indicating its compliance with some of the more important data quality rules. The scoring can likewise be correlated to project success, and therefore business success. It's not a perfect science, but once you get in this practice, it's immensely valuable.

Michelle Knight

RE: How much Data Quality is good enough?
(in response to William McKnight)

Thank you, William,

I agree that Data Governance needs to play a role in defining Data Quality standards, especially when the requirements are more subjective than objective. Data Quality needs to be specified. Plus standards are data too and need the same Data Quality vetting. On both sides, business needs to make sure the standards are clear and written well so Data Quality can be measured and achieved technically. IT needs to understand the business requirements as they are specified and not construct what the technician thinks it should be in IT's bubble.

 

I think the scoring you talk about refers to acceptable risk, which can correlate to project success. This is where reproducibility and risk coverage comes into the picture. The more efficiently business standards can be duplicated and important requirements validated then the lower the risk. Every possible need may not be covered but the most comprehensive options is best.

Michelle Knight

[login to unmask email]

Freelance Production Assistant

Freelance Data, Technology and Science Writer

Merrill Albert

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

I think people need to be realistic and know what to expect.  I am always hesitant when people talk percentage.  For instance, if someone says customer data is 80% good, that's probably too broad.  If phone number is 100% good and address is 60% good, you're just going to worry unnecessarily hearing 80% if you need confidence in phone number.  You need to know what is wrong and what risk you can accept.

Michelle Knight

RE: How much Data Quality is good enough?
(in response to Merrill Albert)

Hi Merrill,

I can understand why you would be hesitant to talk percentage. In some situations can be helpful. When I was contracting as a software tester for Intel, I ran a basic acceptance test for a medical IT device. I kept running into an issue where a VPN error was occurring sometimes but was not consistent. It was an issue that was difficult to reproduce. Manager and staffed panicked.

Through a meeting with development, QA, and management we did a little more systematic analysis to understand frequency. As testers, we came back to the team and asked what risk percentage was ok. We concluded the issue occurred too infrequently and could be mitigated in other ways. In this case, the risk was considered acceptable. Sometimes I wonder if we had a number upfront, then time and effort would have been saved going too much further or at least led to a less worrying meeting.

I agree that people need to know what is wrong and what risks to accept. I think that is helpful to define ahead of time in order to have good enough Data Quality. But it is situation dependent. . Some scoring system of acceptable risk may be helpful at least, to start.

 

Michelle Knight

[login to unmask email]

Freelance Production Assistant

Freelance Data, Technology and Science Writer

William McKnight

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

We want to build into the scoring all the nuance to make it applicable to its various usages. It can change over time. For phone number, this may mean starting with just is it there and moving onto if it's valid. 

Eric Dodson

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

I had an idea to standardize on a Data Quality Label, much like the USA FDA-required food Nutrition Facts Label. A standard Data Quality could list the metadata and data types along with various Data Quality dimensions (according to industry standard definitions) and provide scores or percentages to the applicable dimensions, much like food labels list ingredients and tell how many calories, vitamins, etc.

Then, whether your have a database or a data set that will be consumed by a downstream system, posted internally, or posted on an open data portal, the consumer can review the human-readable label, or their extract or AI programs can review the label in a machine-readable format, and make a decision if the quality is good enough to consume the data or not.

In this way, the data providers do not necessarily always have to decide if the quality is good enough. Yet in some cases, if the data providers are held to quality standards, then when reviewing the generated label before providing or publishing, they would know when they would first need to go back and correct data quality issues before providing or publishing.

Edited By:
Eric Dodson[All DATAVERSITY Members] @ Dec 20, 2019 - 01:03 PM (America/Eastern)

Jeff Albro

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

I think that is entirely for the business owners or data stewards to decide.

They need to be shown what the issue is, how big it is, what the impact is, and how to fix it.  Building scripts to score and monitor the data quality is a good idea to help with that discussion.  

The harder part is when the business wants good data, but doesn't want to fix it, or the business area that needs to fix it is different than the business area that needs better data.  Then my job is to run it up the chain of command.  Specific examples are a huge help with that.

 

This only applies to source or business definition data quality issues.  If the problem is ETL you need to fix your internal processes.  If most of your data quality issues are ETL then you have a real issue on your hands that will need technical leadership to fix.

-Jeff

Michelle Knight

RE: How much Data Quality is good enough?
(in response to Jeff Albro)

I agree with Jeff, that business owners and data stewards need to be in the conversation. Yes, business needs to be shown what the issue is, how big it is, and what the impact is. I like the idea of building scoring scripts to monitor data quality. I also agree that specific examples are a huge help.

However, I recommend, as William McKnight did in the second reply to this thread,  that Data Quality standards need to be defined by Data Governance, See 2nd post above.

In my experience, sometimes when a business wants good data but is unclear about the requirements/standards around that data's collection, maintenance, and use. That can lead to a business area feeling less gung ho about fixing it.

Data Governance comes here and helps businesses prioritize and break things down to get good data. Also, the chain of command is there, with good Data Governance; which frees up other executive resources for other decisions.

Michelle

Michelle Knight

[login to unmask email]

Freelance Production Assistant

Freelance Data, Technology and Science Writer

Harald C Smith

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

It's a very contextual issue too. Different issues come into play at different points within data flows and information supply chains, and it will vary by industry.

While it would be great to have perfect quality for incoming data, there's an aspect of "right-sizing". If you're looking at DQ in an Emergency Room, sure it would be nice to have the patient's name and address, but you're not going to stop treatment to wait for it and ensure it's of the right quality.  For online orders, you definitely want to ensure that you've collected valid shipping addresses and credit cards - if not you'll have unhappy customers who will go elsewhere.

When you are matching data from multiple systems, "good enough" may not be good enough. I certainly don't want to see my financial accounts or customer loyalty programs linked to someone else's. But undermatching data is also an inhibitor. For customers, it can be an annoyance. For the business, you may have missed out on fully understanding your B2B relationships - and it's weakening your relationship with them.

And as we now feed data into downstream analytics and AI/ML systems, what happens with those errors and issues we ignored before? Do they get propagated and extended via black box algorithms into poor business decisions?

There's an ongoing, iterative need to minimally evaluate critical and key data elements that drive the business and ensure they have the "right" data quality at the right points in their lifecycle.

Harald Smith
Director, Product Marketing
p (781) 730-3119 
[login to unmask email]

Raymond Barnes

RE: How much Data Quality is good enough?
(in response to Harald C Smith)

Great reply post Harry. And to answer your question: "And as we now feed data into downstream analytics and AI/ML systems, what happens with those errors and issues we ignored before? Do they get propagated and extended via black box algorithms into poor business decisions?"

The answer is YES. Here's an excellent demonstration of how (See Lessons Learned # 2 Testing and Monitoring … (Do it!): https://learning.acm.org/techtalks/mlproduct

Michelle Knight

RE: How much Data Quality is good enough?
(in response to Harald C Smith)

Hi Harold

I think you are onto something with the context. Data Quality has to be fit for purpose and some purposes are more important at a time than others. In the case of the Emergency Room, the treatment is most important at the time of critical injury. But, once the patient is stabilized, the patient's family and emergency contacts need to be notified if the person puts down an emergency contact. So finding out the patient name and address, at that point, would be necessary.  For online orders, you would want to ping a customer if a card, originally entered in the system, had expired should a purchase be made after the expiration date. So good enough data quality not only needs to be fit for purpose but also for the best purpose at the best time.

 

Michelle

 

Michelle

Michelle Knight

[login to unmask email]

Freelance Production Assistant

Freelance Data, Technology and Science Writer

Kasu Sista

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

Very interesting question. In my experience, data quality can be only as good as what can be measured. For example, when we look at provider data in healthcare the quality measures vary greatly depending on the type of provider. So the company (payer, provider etc.), has to deal with varying degrees of quality and manage risk accordingly. To be specific, internally generated provider data can be up to 95% good, while third-party supplied provider data may be a lot lower. Behavioral health provider data can be as low as 25% because it is difficult to gather that data.

Business will deal with the best data they can get.  Bottom line is that they are looking for data that they can trust. Quality, whether data or otherwise has to be a partnership based on trust. 

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: How much Data Quality is good enough?
(in response to Raymond Barnes)

Bad data quality = Bad insights!

A recent Frost & Sullivan survey of 1,636 IT decision makers around the world confirms that assumption. We found that 63 percent of companies use AI and machine learning today, and 72 percent plan to up their investment over the next two years.*
There’s just one problem — most companies won’t achieve desired outcomes from AI if their data quality isn’t, well, quality. And, chances are, it isn’t.

 

Why AI craves good data
At its core, AI uses advanced algorithms and machine learning to better capture, process and act on information. Whether you’re using it in the contact centre to improve customer and agent experience, on the production floor to optimize productivity and streamline your supply chain, or in the back office to speed decision making and drive innovation, AI needs good information, like our bodies need good calories, to operate at optimal levels.
Source: Top End User Priorities in Digital Transformation, Global, 2019.
Edited By:
Ray Diaz, CBIP, CDP, CSM, ICP-ATF[All DATAVERSITY Members] @ Mar 13, 2020 - 02:04 PM (America/Eastern)

Frank Cerwin

RE: How much Data Quality is good enough?
(in response to Harald C Smith)

Depends on the quality required for the business purpose it will be applied to.  Just like some products come in "Good", "Better", "Best" quality options, the same can be applied to data.  Your business requirements should specify the level of quality required.  For example, when we had latitude and longitude coordinates to 4 decimal places of precision, that was good enough for marketing analytical purposes.  However, when the same data was applied to the new business purpose of smartphone store locator app, that level of precision was not accurate enough to get the customer to the front of the store (5 levels of decimal point precision are required to get you to the front door).  So, in answer to your question, data quality is good enough when it satisfies the business purpose to which it is applied.

Michelle Knight

RE: How much Data Quality is good enough?
(in response to Frank Cerwin)

Frank, it makes sense that Data Quality is tied to the business purpose and I like your example of latitude and longitude. It also ties in with Harold's post about context. It seems as if Data Quality is tied into an ontology, good sets of data that can be used for a purpose. Given that ontologies, business purposes and contexts change, the Data Quality that was good ten years ago may not be good today. In that case, what do you do to change that Data Quality standard to work better with current situations and how do you plan for this eventuality?

Michelle Knight

[login to unmask email]

Freelance Production Assistant

Freelance Data, Technology and Science Writer

Frank Cerwin

RE: How much Data Quality is good enough?
(in response to Michelle Knight)

Michelle, Quality is always determined by the consumer.  No matter whether we're talking about data, cars, or any other products.  This is one of the key principles of ISO8000 that I learned in earning an ISO8000 Master Data Quality Manager certification.  So, you must know the requirements of the data consumer to determine if the suppliers' data will satisfy that consumer.  Therefore, in answer to your question, those responsible for data architecture/management must have a close relationship with business areas as well as IT applications in order to see the future requirements for data based on the purpose for which it will be applied.  If this relationship does not exist then you continue to be an order-taker (hoping the data order is correct) and always at least one step behind.   This type of relationship with the business is possible if you can discuss data in business terms and your quality metrics are truly aligned to business outcomes.