DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf ·...
Transcript of DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf ·...
![Page 1: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/1.jpg)
DEEP DIVEModern Data Pipelines:
Improving Speed, Governance and Analysis
www.dmradio.biz
![Page 2: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/2.jpg)
Featured Speakers
![Page 3: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/3.jpg)
![Page 4: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/4.jpg)
Constraints Drive DesignWhen conditions change, objectives mustHighway design = commensurate with trafficNo more Moore, massive parallelism better?Maybe some applications really should dieNo time like the present to begin anews!
![Page 5: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/5.jpg)
Architecture Matters
![Page 6: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/6.jpg)
Engineering Revolution Enabled Skyscrapers
By Joe Mabel
![Page 7: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/7.jpg)
Adopted by the EU, but affects the USA
![Page 8: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/8.jpg)
Don’t Forget the Basics: Costs Matter!
• Modern solutions have cost structures too
• Project planning will always be a moving target
• Build in some financial buffers to help ensure long-term success
![Page 9: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/9.jpg)
Share your plans with key stakeholders!
Good communicationhelps to ensure success!
![Page 10: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/10.jpg)
Process Matters: Continuous Improvement Is Key
• The faster you see value, the more engaged your stakeholders will be
• Create a virtuous circle of improvement by spreading the wealth
• Evangelize success stories; pat your users on the back whenever appropriate
• Starting small is important, but have a long-term plan in mind; this can always change
• “Say yes” whenever possible, even if it’s a tentative “yes” for the near term
![Page 11: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/11.jpg)
In search of the perfect data s tack
A brief history of Data warehousing, ETL, BI, and
Data Governance
![Page 12: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/12.jpg)
OVERVIEW OF PRESENTATION
- History - looking at trends
- Dates are roughly stated
TAYLOR BROWN
COO & [email protected]
![Page 13: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/13.jpg)
2000’s
![Page 14: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/14.jpg)
http://www.bogotobogo.com/Hadoop/BigData_hadoop_OLTP_vs_OLAP.php
2000’s Data Stack
![Page 15: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/15.jpg)
https://community.softwaregrp.com/t5/Big -Data/The -OLAP-cube-is-history/ba -p/231673#.WynTwhJKjMU
2000’s Warehouse - OLAP Cubes
Fast option for analytics vs OLTP Databases
Generally slow and expensive infrastructure
Cost for 1GB = $7.70
![Page 16: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/16.jpg)
2000’s Data Pipelines - ETL
Extract, Transformation & Load
Informatica or custom code
● Heavily cus tomized● T ype, column, table mapping● T rans form data prior to load● Aggregations performed in
pipeline
![Page 17: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/17.jpg)
2000’s Data Governance
● Hardened systems ● Centralized planning ● Good Data Governance
![Page 18: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/18.jpg)
https://www.element61.be/en/resource/sap -business-warehouse -business-objects -front -end-integration -what -available-today
2000’s BI Tools
Heavy Monolithic BI tools for Reporting
Cognos, Hyperion, Microstrategy
● W hat happened in pas t?● V ery Accurate● V ery inflexible. ● Hardened s ys tems● Months to change
![Page 19: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/19.jpg)
http://www.bogotobogo.com/Hadoop/BigData_hadoop_OLTP_vs_OLAP.php
2000s Total Stack = 5+ Tools
![Page 20: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/20.jpg)
http://www.bogotobogo.com/Hadoop/BigData_hadoop_OLTP_vs_OLAP.php
2000s Team Structure = 6+ Teams
Project Management
Engineering IT Analysts Business Users
Executive Team / Management
![Page 21: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/21.jpg)
![Page 22: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/22.jpg)
2006
![Page 23: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/23.jpg)
2006’s Challenges with OLAP
● Data Availability● Inflexibility● Speed● Compromise with end users● Data volumes
![Page 24: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/24.jpg)
2006’s Warehouse - Column Store MMP On -Prem DB
Column-Store designed for analytical queries
Massively Parallel Processing (MPP) - Queries (jobs) divided up between the nodes in the cluster, each one does a portion of the work
Teradata, HP Vertica, IBM Netezza, Oracle Exadata…
Leader Node
Follower Nodes
Query
Each node has a portion of the data (sharded tables!)
![Page 25: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/25.jpg)
2006’s Stack
![Page 26: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/26.jpg)
2006’s 5+ Tools & 6+ Teams
Project Management
Engineering IT Analysts Business Users
Executive Team / Management
![Page 27: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/27.jpg)
2008
![Page 28: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/28.jpg)
2008’s Self Service BI
Asking of data, why did this happen?
Tableau, Qlik
● Drill down ● E xplore ● S till in data s ilos ● Multiple vers ions of truth
![Page 29: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/29.jpg)
2008’s Data Governance
More data, more consumers. More complex data. Multiple version of the same truth. Decentralized BI tools.
Herding Cats!
![Page 30: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/30.jpg)
2011
![Page 31: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/31.jpg)
2011’s Challenges with on -prem MPP Column store warehouses
Variety of Data
Variety of Analytics
![Page 32: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/32.jpg)
2011’s Hadoop to the Rescue!
Built to:
Scale to Big Data
Handle all forms of data
Allow any type of analytics
![Page 33: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/33.jpg)
2011’s Hadoop Stack
![Page 34: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/34.jpg)
2011’s Total Stack = 6+ Tools
![Page 35: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/35.jpg)
2011’s Still complicated team structure = 6+ Teams
Project Management
Engineering IT Analysts Business Users
Executive Team / Management
![Page 36: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/36.jpg)
2013
![Page 37: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/37.jpg)
2013’s Issues with Hadoop
Easy to dump data into a hadoop data lake… hard to manage data and extract value.
● C omplicated low level s etup & maintenance
● R equires experienced development teams
Ultimately companies end up s ending data from Hadoop to S QL databas e for Analytics .
Dead end!
![Page 38: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/38.jpg)
2013’s - MPP Column Store in the Cloud - Redshift!
Fast, affordable EDW on AWS - awesome!
● MPP Scales ● Far less expensive than on -prem Column Store EDW● Fairly easy to resize clusters etc● 1 GB of data $0.05
![Page 39: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/39.jpg)
2013 Cloud -Native Self Serve BI
Goal: Allow both centralized control of data, but also self serve to entire company.
Looker
Make data so accessible, it starts to change the culture at the company to be more data driven.
● Us ing data to try to predict future● S ingle vers ion of the truth● Full data acces s ibility● S uper fas t, query directly agains t the DW H
![Page 40: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/40.jpg)
2015
![Page 41: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/41.jpg)
https://www.intermix.io/ - https://www.quora.com/What -problems -have-you-faced -while-working -with -Amazon-Redshift
2015’s Challenges with Redshift
“There’s a 99% chance that the default configuration will not work for you!”
~ Lars Kamp
![Page 42: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/42.jpg)
2015s Cloud -Native Column -store MPP Data Warehouses
1. Separation of compute & storage
2. Zero infrastructure management
3. Structured & Unstructured data
4. Instantly Scalable Compute
![Page 43: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/43.jpg)
Separation of Compute & Storage
Leader Node
Query
Data is stored in flat files in an Object Store (S3, Google Cloud Storage, etc)
“Infinite storage”
Data is copied onto the nodes in the cluster at compute time
![Page 44: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/44.jpg)
No more queue issues!
Run many warehouse clusters off of same data sets!
ETL Finance BI
![Page 45: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/45.jpg)
Elastic Compute
Re-size cluster in seconds!
![Page 46: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/46.jpg)
Data Sharing
Share data across companies!
Fivetran Acme Co Bob’s Plumbing
![Page 47: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/47.jpg)
How does this affect ETL?
![Page 48: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/48.jpg)
Recap of changes
Warehouses BI ETL
20 0 0 OL AP
20 0 6 On-prem C olumn S tore MMP
20 11 Hadoop
20 0 0 C loud C olumn S tore MMP
20 15 C loud Native C olumn S tore MMP
20 0 0 Monolithic R igid B I
20 0 8 S elf S erve B I
20 13 C entralized C loud Native S elf
S erve B I
20 0 0 C us tom E T L
? ?
![Page 49: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/49.jpg)
Challenges with ETL from 2000’s
ETL was optimized for slow on -premise OLAP data warehouses, with massive storage constraints.
Optimized for pulling from on -premise enterprise applications
![Page 50: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/50.jpg)
Extensive Setup
![Page 51: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/51.jpg)
Ongoing Maintenance
![Page 52: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/52.jpg)
Extensive Planning
![Page 53: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/53.jpg)
2015’s Shift in company structures
With move to cloud, IT teams are shrinking.
Analyst at the front of self serve BI and want:
● S imple Infras tructure ● fully managed s ervices● wholis tic control over s tack
![Page 54: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/54.jpg)
Other changes since 2000
Rise of cloud applications
Drop in data storage 1GB = $0.02
Agile workflows
![Page 55: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/55.jpg)
CLASSIC ETL
Extract -
Transform -
Load -
Visualize -
- Extract
- Load
- Transform
- Visualize
MODERN DATA STACK
![Page 56: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/56.jpg)
- Extract
- Load
- Transform
- Visualize
Agile Cloud -Native Self Serve Analytics - MODERN DATA STACK
FULL SCHEMA REPLICATION (DATA LAKE)
STANDARDIZED SCHEMAS
CENTRALIZED & COLLABORATIVE
SQL-BASED MODELING & TRANSFORMATIONS
![Page 57: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/57.jpg)
- Extract
- Load
- Transform
- Visualize
MODERN DATA STACK
Modularize replication (Extraction & Load) from Transformation ( Data gov )
![Page 58: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/58.jpg)
- Extract
- Load
- Transform
- Visualize
MODERN DATA STACK
Simplifies your management stack3+ Teams
Project Management
Engineering IT
Analysts
Business Users
Executive Team /
Management
![Page 59: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/59.jpg)
Recap of changes
Warehouses BI ETL
20 0 0 OL AP
20 0 6 On-prem C olumn S tore MMP
20 11 Hadoop
20 0 0 C loud C olumn S tore MMP
20 15 C loud Native C olumn S tore MMP
20 0 0 Monolithic R igid B I
20 0 8 S elf S erve B I
20 13 C entralized C loud Native S elf
S erve B I
20 0 0 C us tom E T L
20 15 E L T S eparate E L & T rans form
![Page 60: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/60.jpg)
Zero Configuration, Zero Maintenance, Data Pipelines
![Page 61: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/61.jpg)
Fivetran helps you achieve data acces s ibility with its zero configuration, zero maintenance data pipelines
![Page 62: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/62.jpg)
Data pipeline as a service
Your warehouse
APPLICATIONS DATABASES FILES EVENTS
![Page 63: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/63.jpg)
For an updated list of data sources visit fivetran.com/directory
InstagramIntercomiTunesJiraKlaviyoLinkedIn AdsMagentoMailChimpMandrillMarketoMavenlinkMixpanelNetSuite SuiteAnalyticsOptimizelyP ardotP interest AdsQuickB ooks OnlineR eC hargeR ecurly
Events
Google Analytics 3 6 0S egment
Files
Amazon C loudfrontAmazon K ines is F irehoseAmazon S3Azure B lob S torageC S V UploadDropboxE mail C S V IngesterFTPFTPSGoogle C loud S torageGoogle S heetsSFTP
Databases
Amazon AuroraAmazon R DSAzure S QL DatabaseDynamoDBGoogle C loud S QLHerokuMariaDBMongoDBMySQLOracle DBPostgreSQLSQL Server
Applications
Apple S earch AdsAsanaAdR ollB ing AdsB raintree P aymentsDesk.comDoubleC lickDynamics (3 6 5 , GP , AX)E loquaFacebook Ad Ins ightsFreshdeskFrontGithubGoogle AdwordsGoogle AnalyticsGoogle P layHelp S coutHubS potHybris
S ailthruSalesforceS alesforceIQS AP B us iness OneS endGridS hopifyS tripeS ugarC R MT witter AdsXeroY ahoo GeminiZendeskZendesk C hat (Zopim)Zuora
S nowplowW ebhooks
![Page 64: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/64.jpg)
Pull historical Data
Normalize
Create Schema/Tables & Load Data
Authenticate, and we do the rest...
Update
1
2
3
![Page 65: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/65.jpg)
Fivetran Data Normalization Behavior
We replicate Normalized Schemas(Databases, SFDC, Netsuite)
We normalize Denormalized Data(from APIs)
SOURCE WAREHOUSE
![Page 66: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/66.jpg)
Standard Schemas - ERDs
![Page 67: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/67.jpg)
Incremental Batch Updates
INSERT
UPDATE
DELETE
SOURCE WAREHOUSE
![Page 68: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/68.jpg)
Automatic Schema Migrations
ADD COLUMN
REMOVE COLUMN
CHANGE TYPE
SOURCE WAREHOUSE
ADD OBJECT
REMOVE OBJECT
![Page 69: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/69.jpg)
Complexity compounds. Automate, s tandardize and s implify as much of your
s tack as you can.
![Page 70: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/70.jpg)
Recap of changes
Warehouses BI ETL
20 0 0 OL AP
20 0 6 On-prem C olumn S tore MMP
20 11 Hadoop
20 0 0 C loud C olumn S tore MMP
20 15 C loud Native C olumn S tore MMP
20 0 0 Monolithic R igid B I
20 0 8 S elf S erve B I
20 13 C entralized C loud Native S elf
S erve B I
20 0 0 C us tom E T L
20 15 E L T S eparate E L & T rans form
![Page 72: DEEP DIVE - DATAVERSITYcontent.dataversity.net/rs/656-WMW-918/images/DeepDiveDataPipelines.pdf · Informatica or custom code Heavily customized Type, column, table mapping Transform](https://reader030.fdocuments.net/reader030/viewer/2022040914/5e8b3be5b6fd072b602eb194/html5/thumbnails/72.jpg)
Zero Configuration, Zero Maintenance, Data Pipelines