Swimming in a sea of bad data

HEATHER STACCO

Swimming in a sea of bad data

Our organization has grown thru acquisition over the last 20 years and as each new business unit came onto the platform along with it came data quality issues and potential duplication.  We are finally taking the steps towards quality and standardization and I'm overwhelmed to say the least.  Our first focus is MDM and Governance so that we can define what quality data actually is.  Has anyone had a similar experience and would like to share do's and don'ts?  I'm considering vying for a consultant to come in and help us stand something up or at least direct us in how best to approach and could use some suggestions.  Thank you in advance for any input. 

Merrill Albert

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

This is a common problem Heather.  Start with data governance so you have the right people identified to help you in other efforts, like MDM and DQ.  People need to tell you what good likes that and then you can technically analyze the data to identify problems.  Like you say, a consultant might be the way to go.  Sometimes, that external voice can say things that employees don't want to say because they have a history with a particular data set.

HEATHER STACCO

RE: Swimming in a sea of bad data
(in response to Merrill Albert)

Thank you, thank you, thank you for replying.  I've been working diligently on filling out the skeleton of the overarching data governance policy.  Purpose, Data Roles and Responsibilities, first data sets we've identified to move forward with.  Now I want to go down the road of defining the mini policies we need to work on first.  The webinar the other day gave me some direction to work on what needs work first and I think that is defining quality and audit.  Does this sound like I'm at least beating at the right door? Do you have any experiences where you've been at that door yourself and if so how did you move forward?  Any experience\knowledge I can draw from I'll soak up like a sponge.

Thanks again- Heather

Walter Howard

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

You might want to go ahead and engage some DQ vendors, e.g. Informatica, IBM.  They can guide through the steps to cleanse your data (and sell you some software too!)

Ideally, you already have some of these tools onsite.

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

Heather,

You may want to back into your Data Governance program by focusing on the Data Quality falldown to prove the need for the organization to commitment to governance. 

Consider performing a Data Assessment by profiling and reporting on the results of some important datasets. Start with SQL and Excel, but then sell the need for a toolset that can make a difference in improving data quality for a broader set of data.

People need to taste and touch the issues with data before the methods to deal with them. Also  reporting on Data Quality issues really needs as toolset, since manual reporting is too difficult to operationalize, so that the data can be fixed.

Then you can start to introduce the structure of Data Stewards or Owners, Policies, and Councils. 

Edited By:
Ray Diaz, CBIP, CDMP, CSM[All DATAVERSITY Members] @ May 09, 2019 - 12:38 PM (America/Eastern)
Ray Diaz, CBIP, CDMP, CSM[All DATAVERSITY Members] @ May 16, 2019 - 11:05 AM (America/Eastern)

Aaron Fuller

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

Heather, my rule is "If you move it, you have to prove it." In other words, any time data is acquired from a source system and any time it is moved to a downstream database (DW, ODS, marts, etc.) there need to be companion processes that test the data content and structure to ensure that it meets business standards. That means you have to have a full inventory of what data you have and then a full set of rules as to what good enough means for each data element and data set. Obviously this can't all be done in one fell swoop. It has to be done in iterations over the course of years with a standard approach for defining, implementing and tracking data quality measures. I like to think of it as building the data warehouse for the data warehouse.

Feel free to reach out if you'd like to talk about consulting help, and good luck!

Aaron Fuller, CBIP

Founder & Principal Consultant

Superior Data Strategies 

517-803-0714

superiordatastrategies.com

<a href="https://www.superiordatastra

William McKnight

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

Acquisition time always presents a great opportunity to tackle data quality. Before you start combining systems, you want to be sure they contain usable data for purpose. My suggestion is to elicit data stewards by major subject area of the business and find out the shortcomings in how they're using data today. Problems can be fixed on entry or, as Aaron suggests, when the data is moved. You can change data, hold it out for manual review or just flag it somehow in an alert. I wouldn't recommend aiming a tool at your systems and seeing what it reveals. This will probably reveal a lot but gives you no sense of priority. The priority should come from the stewards. MDM is probably an architectural component that you can use as a single collection point and refinement center for master data.

Charles Harbour

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

Hi Heather,

By no means are you alone.  Reading between the lines of what you've written, I would guess that the reason you're in this situation is that the company sees Data Management as an expense, not an asset.  By starving the department, the leadership team has made it worse over the years.  Not to say there haven't been improvements, but I have to believe you're at about a -2 on the maturity scale. 

In order to justify the budget needed for consultants (or even your own staff to work on improvements instead of simply keeping the lights on), my suggestion is to focus on the risk, and quantify the cost of when things go wrong.  One of my previous employers was reluctant to spend a million dollars on a data quality improvement project - they ultimately paid one of our customers 30 million dollars in a lawsuit directly related to that data quality issue.

As far as practical approach, it sounds like you're approaching the problem in the correct manner - focus on the key elements of the organization and then work your way out.  A pro tip - learn and incorporate best practices in ABC (Audit, Balance and Control).  That makes it easier to pinpoint where things go wrong, and gives you that confidence that you're looking for (the 'if you move it, prove it'). 

Another suggestion would be to take the time to document what you have, in simple/conceptual pictures.  This makes it easier for the management folks to have some insight into the challenges and dependencies.  You can use that as your basis for your gap analysis and roadmap, so that they will better understand that this is going to take time, that there are things that need to be done in a proper order, that there are eatable pieces to getting it done.

Good luck!

CH

william burkett

RE: Swimming in a sea of bad data
(in response to HEATHER STACCO)

I'm very late to the party here, but if Data Quality is the issue, I'd recommend Danette McGilvray's book "Executing Data Quality Projects".  I'm a tough audience and this is one of the few data-related books I'd recommend. (And I think Ms. McGilvray is a contributor to this forum, somewhere.)