Suggestions for data pipelines between domains

Cathryn Crane

Suggestions for data pipelines between domains

I would love some suggestions of tools to assist with the ETL or ELT process.  I have a Dev environment in a domain that I pull new records into which is an SQL VM in Azure that I need to massage and push to our Prod SQL Server in a different domain.  Traditionally I have done SSIS but I know there are much better solutions now.  I am playing with data pipelines but struggling with messaging when the pipeline errors out.  I'm thinking about a NoSQL solution to stage all the data and massage before pushing to the Prod SQL Server but not sure if that's really the right way to go either.  Normally I would move Dev to Stage to Prod all in SQL Server.

 

Things need to go in a particular order and most of it is in Stored procs right now.  The amount of data I need to pull and push is in the thousands, not the millions so it's not huge.

Does anyone have any suggestions on the strategy of doing this with better technology?

Michelle Knight

RE: Suggestions for data pipelines between domains
(in response to Cathryn Crane)

I would caution using a NoSQL solution to stage all the data and massage it because of difficulties as this DZone article mentions. Have you looked at other data integration software tools (See Solutions Review?) or considered a  Data Virtualization related solution? My experience physically using such tools has been limited though.

Ravindra Punuru

RE: Suggestions for data pipelines between domains
(in response to Cathryn Crane)

Diyotta would be perfect fit for your use case.

Diyotta is a data integration technology built on an ELT architecture. Using Diyotta you can perform ELT workloads on cloud data warehouses, Hadoop or any MPP style data architectures. Visit https://www.diyotta.com for more info.

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Suggestions for data pipelines between domains
(in response to Cathryn Crane)

I am currently looking at Fivetran Data Pipeline.  https://fivetran.com/ Most Cloud environments don't let you directly attach to the data source and extract. Replication allows the deltas of the data you need to land in a Raw zone, and then you can profile, fix, prepare, and moved to a trusted zone.

I like Data Replication and landing Raw data, and then ELT transforming into zones (schemas) .

Ravindra Punuru

RE: Suggestions for data pipelines between domains
(in response to Ray Diaz, CBIP, CDP, CSM, ICP-ATF)

Hi Ray

Did you look at Diyotta (https://www.diyotta.com/) by chance?  Diyotta could be right fit for consolidating data to your raw zone and then perform ELT on your cloud data warehouse.

-Ravindra

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Suggestions for data pipelines between domains
(in response to Ravindra Punuru)

Ravindra, thanks for the information on this solution. I has not seen this one yet. It does sound a bit complex when I see:

  • Production Controller
  • Pushdown Platform
  • Agents
  • 25% Non-production Instances

I would need to know more about these stack.

Thanks 

Ravindra Punuru

Suggestions for data pipelines between domains
(in response to Ray Diaz, CBIP, CDP, CSM, ICP-ATF)
Hi Ray

I would love to arrange a call to show you the architecture and demo of the use case if that helps. Send me an email to [login to unmask email]<mailto:[login to unmask email]>, so that I can communicate further.


Regards
Ravindra


From: Ray Diaz, CBIP, CDP, CSM, ICP-ATF <[login to unmask email]>
Sent: Monday, March 9, 2020 2:37 PM
To: [login to unmask email]
Subject: [Data Strategy] - RE: Suggestions for data pipelines between domains


Ravindra, thanks for the information on this solution. I has not seen this one yet. It does sound a bit complex when I see:

* Production Controller
* Pushdown Platform
* Agents
* 25% Non-production Instances

I would need to know more about these stack.

Thanks

-----End Original Message-----

Ravindra Punuru

Suggestions for data pipelines between domains
(in response to Ray Diaz, CBIP, CDP, CSM, ICP-ATF)

    

Attachments

  • image001.jpg (1.6k)
  • Diyotta Intro.pdf (1036.5k)
Edited By:
Ravindra Punuru[All DATAVERSITY Members] @ Mar 10, 2020 - 09:39 AM (America/Pacific)

William McKnight

RE: Suggestions for data pipelines between domains
(in response to Cathryn Crane)

There are many. They include Microsoft Azure Data Factory, Amazon Web Services Data Pipeline & Glue, Alteryx, Trifacta, Paxata, Datameer and Fivetran.

I would also caution about introducing NoSQL in the pipeline.

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Suggestions for data pipelines between domains
(in response to William McKnight)

Thanks William, we are also looking at Azure with its Data Factory.

We are reviewing pipelines that can provide connectors for the ServiceNow SaaS application. 

Ray Diaz, CBIP, CDP, CSM, ICP-ATF

RE: Suggestions for data pipelines between domains
(in response to Ray Diaz, CBIP, CDP, CSM, ICP-ATF)

I also like Alteryx for the ELT/ETL for that use case, and that I can be productive on the same day.

The majority of Cloud apps and platforms require to hit their APIs to extract datasets so you need their connectors unless you want to write lots of code to do it yourself. 

I prefer when possible data replication toolsets, since they manage the heavy lifting of custom connectors, scheduling frequency, initial loads, deltas, schema changes, error detection and syncing...