Decision Tree for determing what to build

Carol McGrath

Decision Tree for determing what to build

I am interested in understanding if anyone has developed a decision tree to determine what type of analytics product is produced.

For example, if you are a data warehouse team, your default is to build a DW for business users.

If you are a data scientist, your default is a model

Has anyone been able to come up with a method of taking a business problem and based on key inputs, enables a team to determine what output should be produced for the business?

 

Edited By:
Carol McGrath[All DATAVERSITY Members] @ Mar 08, 2019 - 05:32 PM (Pacific/Wellington)

William McKnight

RE: Decision Tree for determing what to build
(in response to Carol McGrath)

I have not exactly done that, but you are right with the implication that no "one size fits all" and many platforms are applicable to an enterprise today. The data warehouse is still an important shared artifact, ensuring efficiency of effort and consistency in data usage across some of the analytical needs of the organization. The MDM hub does some of the same things for master lists of the organization. The data lake tends to give the data scientists what they need by giving them everything possible - and great data scientists can use it all. There is very likely a need for analytic structures outside the data warehouse for several reasons, not the least of which is that many data warehouses today are too bulked up and not agile enough, and many analytic needs are at a different standard than the warehouse could ever get to. There are also reasons to create marts off a warehouse, although I tend to think that is done frequently due to poor design of the warehouse, which could be the better option to meet the need. 

I will be giving a webinar on Thursday talking about these points: https://www.dataversity.net/mar-14-adv-webinar-databases-vs-hadoop-vs-cloud-storage/. 

Carol McGrath

RE: Decision Tree for determing what to build
(in response to William McKnight)

Thanks William, I have the webinar slides and video.  Still very keen to understand how other companies are viewing their business intelligence, data science activities in light of businesses requiring speed of insight; near real-time datasets built super fast - DW's traditionally are more complicated to build, overnight batch updates etc.

Has anyone made any decisions on when to use a DW vs building a real-time data store or just simply ingesting the data and allowing end users to build analytics in any way they like?

If a business is typically a DW shop, how have they transitioned to the modern platforms and analytics or have they not done so?  Do companies have separate teams that support different methodologies?

So many questions!

William McKnight

RE: Decision Tree for determing what to build
(in response to Carol McGrath)

Hi Carol,

While it's true some DW are more complicated to build (I hear "3 months to getting any new data in it" a lot) and they are largely overnight batch, etc., the solution may be to fix those items with a modernization process. I've see a lot of throwing the baby out with the bathwater from organizations that seem to link the data warehouse to its problems and go with an alternative (i.e., Hadoop) that may fix a problem out of the gate, but maybe adding some process around the warehouse and/or doing ETL a different way for a new feed (i.e., real-time streaming) is best. 

Others are doing things all sorts of ways and they tend to put the best front on whatever they're doing, but I see the true effectiveness of actions, sometimes down the line, and I would not want to follow too many out there. Most organizations are maturity level 1 or 2 with data and all need to be a 3 to survive the next few years. 

Some warehouses are too far down a legacy path and need to stay in a contain mode, but I advise not to go there too quick. Think out of the box and you might see more value there. Also when adding around a legacy warehouse that is in contain mode, it's not out of the question that another data warehouse shouldn't be built - to a higher standard (i.e., cloud, real-time, agile, etc.) of course! The DW idea is a strong one.

Carol McGrath

RE: Decision Tree for determing what to build
(in response to William McKnight)

Thanks William for your response.