Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016
-
Upload
avinash-ramineni -
Category
Technology
-
view
94 -
download
2
Transcript of Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016
![Page 1: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/1.jpg)
Practical Guide to Architecting Data
LakesPresented By Avinash Ramineni
![Page 2: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/2.jpg)
Agenda• About Clairvoyant• What is Data Lake ?• Features of Data Lake • Tools • Implementation Challenges• Questions
![Page 3: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/3.jpg)
3Page
Clairvoyant
![Page 4: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/4.jpg)
4Page
Clairvoyant Services
![Page 5: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/5.jpg)
5Page
What is a Data Lake“ A data lake is an enterprise-wide system for storing and analyzing disparate sources of data in their native formats”
“A data lake is a central location in which to store all your data, regardless of its source or format.”
“Is Data lake a replacement or complimentary to EDW ? ”
“Is Data lake just a storage layer ? ”
“ Just having a Hadoop environment is a data lake ? ”
![Page 6: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/6.jpg)
6Page
Data Lake Attributes• Data Democratization
• Data Discovery
• Data Lineage
• Self-Service capabilities
• Metadata Management
![Page 7: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/7.jpg)
7Page
Data Lake
![Page 8: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/8.jpg)
8Page
Self Service Analytics
![Page 9: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/9.jpg)
9Page
Data Governance• Data Acquisition - what, when, where of data• Data Organization – Structure, format• Data Catalog – what data exists in the lake• Capturing Metadata
• Data Lineage• Data Quality• Data Profile• Provenance of data at file and record levels• Business names, descriptions
• Data Provisioning
![Page 10: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/10.jpg)
10Page
![Page 11: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/11.jpg)
11Page
Data Lineage
![Page 12: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/12.jpg)
12Page
Data Lake Challenges
![Page 13: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/13.jpg)
13Page
Guidelines• Expect structured , semi-structure, unstructured data
• store a metadata or tag for location of schema, unstructured
• Store a copy of raw input
• Raw first mile copy of the data so that we can recover our business or almost
• Replay the business if we need to
• Data Standardization – data clensing as a workflow after ingest
• Use a format that supports your data
• Automate metadata management
![Page 14: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/14.jpg)
14Page
Data Lake Security
![Page 15: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/15.jpg)
15Page
Data Security
![Page 16: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/16.jpg)
16Page
Implementation Challenges• Change Data Capture
• Mysql – binlog readers• Oracle - tungsten
• Updating the deltas on to the data lake• Reusable Data movement workflows
• One workflow for table ? (Generate Dynamic workflows based on metadata)• Needs to be driven of metadata
• Schema changes on the Source end• Streaming Data • Partitioning Strategies on the Data Lake
• Configure them into metadata
![Page 17: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/17.jpg)
17Page
Tools / Products• Smart Catalogs
• Waterline Data Inventory• Collibra Catalog
• Data Lake Management• Zaloni Bedrock• Informatica Intelligent Data Lake
• Data Governance and Metadata Management• Cloudera Navigator• Apache Atlas• Collibra Data Governance• Oracle BigData Catalog
![Page 18: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/18.jpg)
18Page
Data Lake Trends• Data Lakes on Cloud• IOT Data Lakes• Logical Data Lakes
• Unified View of data that exists across data stores
• Data Discovery Portals
![Page 19: Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data Conference 2016](https://reader036.fdocuments.net/reader036/viewer/2022062904/588928c41a28ab77528b68f3/html5/thumbnails/19.jpg)
19Page
Questions
• Principal @ Clairvoyant • Email: [email protected]• LinkedIn: https://www.linkedin.com/in/avinashramineni