Data Quality, The 800 Pound Gorilla

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

Data Quality, The 800 Pound Gorilla

"Where does an 800-pound gorilla sit?" The answer: "Anywhere he wants to."

Data Quality is the 800-pound Gorilla in the room that everyone ignores!

 

Wanted to share this excerpt from my Technical Talk as a reminder to our organizations. Your feedback and dialog are welcome. 

Much of the Business Intelligence reporting, Analytics, and Data Visualizations contain data errors that are hidden, because the sources are fraught with anomalies and inconsistencies.

Information quality is crucial in making decisions that positively impact business performance. The commitment to data and information quality must be intentional and to the level of consideration as we place on people, processes, and technologies. To ignore data quality just feeds the Data Gorilla and results in chaos.

Bad quality data has an adverse impact on the health of an organization. If quality issues are not identified and corrected regularly, substandard data can contaminate all downstream systems. The impacts are increased costs, rework and workarounds, increased customer service issues, incorrect forecasts and analytical reporting, and poor decisions that can endanger customer relationships.

As data increases exponentially, organizations usually focus on the volume, velocity and variety of the data, but tend overlook the veracity or “trustworthiness” of the data being ingested. Bad data can take the form of:

  • Missing Data: Null fields that should contain data.
  • Erroneous or inaccurate data: Data not been entered or maintained correctly including misspelling, typos, transpositions, and variations in spelling, naming or formatting.
  • Inappropriate data: Data entered in the wrong field.
  • Duplicate data: A single set of data that occupies more than one record in a database or multiple databases.

Digital data is the building blocks of all reports, analytics, business decisions, and successful business performance. These foundational elements require careful analysis, planning and execution. Your data needs to be managed and maintained over the entire data lifecycle, it cannot just be “create data and forget it.” Your digital data will never be perfect, but the consequences of bad data is huge.

Most organizations do not fund any programs to design and insure quality of data an intentional, systematic, and sustained process. According to TDWI’s Data Quality Survey, almost half of all companies have no plans for managing data quality, yet the DAMA International’s “body of knowledge” wheel on data management illustrates the Knowledge Areas of Data Governance in the center and Data Quality as a spoke.

Since data is the life blood of your business transactions and subsequent basis for the knowledge of your organization and customers, it’s vital to assess your data quality.

Justin Hauck

RE: Data Quality, The 800 Pound Gorilla
(in response to Ray Diaz, CBIP, CDP, CSM, ICP-ATF)

I would be interested in hearing how others have established the concept of data quality.

 

We are wrapping up the first year of standing up Data Governance in our organization. We began with a pilot project with a specific goal of cleaning up data in order to have trusted data in order to meet new accounting standard requirements. We worked with our SMEs to develop business definitions, business rules, and testing rules (among many other items) in order to be able to write a SQL query testing the data. From here, we developed a report that summarized the percent of quality data we had in order to establish a baseline for comparison and progress as we continued down the path of adding more data elements to the inventory and tests.

 

We will develop a standard for future reporting so that we will be able to compare our quality standards from one business area to another, but not sure what that reporting format will look like at the moment.

Merrill Albert

RE: Data Quality, The 800 Pound Gorilla
(in response to Justin Hauck)

For me, the big thing is making sure that the business defines what quality means to them.  They're the ones who know things like a customer has to have these 5 things to be useful to them, they're in an industry where something has to be there, etc.  Too often, the business assumes IT will do it, or IT just does it thinking they understand the rules.  There are technical quality rules you might want to look at too, like date fields all in the same format, but it's those quality rules the business needs that are key.

In reporting on quality, make sure your metrics make sense.  I've seen companies want to create a dashboard that says the percentage that their data is good.  Does that make any sense?  If the entire database is 98% good, it's useless to you if you're only working on the 2% that's bad.  You need to tell people exactly what's wrong and what they need to do without this data.  You have to fix the source of the problem so it doesn't happen again and report on fixing it.  Communication is key.

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Data Quality, The 800 Pound Gorilla
(in response to Justin Hauck)

Justin, Writing SQL to perform data profiling to report on data quality is tough to automate against a large set of tables.

Having done manual data assessments and reporting the results to the business is labor intensive and usually a snapshot in time.

I don't think you can implement an on going viable Data Quality process as part of a governance program without a Data Quality toolset that:

  • Can store the data domain data quality rules 
  • Be able to profiles large numbers tables and data structures
  • Automated scheduled data profiling that outputs scorecards
  • That you can calculate and report the cost of the data quality issues

Then your process has to consider how to report to the fall down and present the options to fix the quality issues:

  • Remediate the applications or integrations causing the issues
  • Perform data cleansing of the data sets
  • Migrate the data to a new data store that is cleansed
Edited By:
Ray Diaz, CBIP, CDP, CSM, ICP-ATF[All DATAVERSITY Members] @ Aug 01, 2019 - 11:15 AM (America/Eastern)

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Data Quality, The 800 Pound Gorilla
(in response to Merrill Albert)

Agreed, need to present the facts of the issues and the impacts. Come with the suggestions to fix and improve the data quality. 

Absolutely communications is key. Be aware that discussions of data quality need to be delicate since the business owns the operational systems that cause most of the quality issues and they can take the news let's say - defensive and personal. 

William McKnight

RE: Data Quality, The 800 Pound Gorilla
(in response to Justin Hauck)

I find it effective to actually put a data quality score on a database or data set. The score is relative to 100 and various potential violations in the data are prorated to make 100. This score is informative to the build team as incentive to improve DQ, but it's also informative as a first glance at the data by the users who need to know how much they can trust the data. Perhaps you've seen report "seals of approval". This is like that, only for the underlying data.

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Data Quality, The 800 Pound Gorilla
(in response to William McKnight)

Excellent and agreed. The Certified Seal helps with trustworthiness to the business. 

Peter Eales

RE: Data Quality, The 800 Pound Gorilla
(in response to Ray Diaz, CBIP, CDP, CSM, ICP-ATF)

There is a really good international standard for data quality, , ISO 8000.  

Happy to help

Kind regards

 

Peter Eales

Industrial data quality expert

+44 (0)7789 881281

[login to unmask email]

United Kingdom

Edited By:
Peter Eales[All DATAVERSITY Members] @ Aug 02, 2019 - 06:03 PM (Europe/London)