Data Lineage - How detailed are your data lineage views. Do you include staging tables and views? We are trying to determine how detailed we should get with our lineage. Should we skip staging and show only source application? Should we show aggregated views as a part of the lineage?
I'm the only one in my organization working on reporting data ETL and modeling, so take this with a grain of salt.
If you have the overall structure / flow of data documented, a good business glossary, and good naming practices I think you can avoid documenting the field level lineage. So I document that data is copied into the "staging" schema, then prepared in the "prep" schema and fact tables are combined in the "report" schema. Anyone with knowledge of the business could look at the table & view names and figure out what is going on really quickly.
Your other option is a linage tool that does everything automatically.
I take data lineage to the data element level and include all hops the data takes. However you decide to do it, you have to think about the reason you're doing it. This is not an academic exercise. You're doing it so that if you have a problem, you can easily trace back to find out where the problem occurred.
Thanks for your input. In this case we are not looking to solve an issue with the report but to help the report user verify the source and possible enhancements.
Thanks. We are trying to show the report user the source of the data for validation and also possible enhancement possibilities.
It sounds like this report is being created at the request of a user or user group? I wonder if asking the user or group of users to sketch a quick lineage view wireframe on a piece of paper or whatever is on hand would shed some light on your question of what to put in your lineage view. I used to do that as a program assistant for Olin College of Engineering and found that it was a way to brainstorm on what the client needed.
Also, I would be upfront about the amount of time and budget available for constructing a lineage view and look at tools already available.
Jeff brought up a good point, in the case of normalized data. If
the overall data structures and documents are good along with a
glossary and naming process then the amount of documentation can be
minimal. Merrill has a different approach that works. Both
responses suggest that what you are dealing with is primarily a
high-level of a business question of which detailed data lineage
views and other capabilities are one piece.
[login to unmask email]
Freelance Production Assistant
Freelance Data, Technology and Science Writer