Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with...

34
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

Transcript of Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with...

Page 1: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

Page 2: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Speakers

Andrew AhnGovernance Director Product Management

Page 4: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

• Atlas Overview• Near term roadmap• Business Catalog• Questions

Page 5: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas Overview

Page 6: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

STRUCTURED

UN

STRUCTU

RED

Vision - Enterprise Data Governance Across Platfroms

TRADITIONALRDBMS

METADATA

MPP APPLIANCES

Project 1

Project 5

Project 4

Project 3

Metadata

Project 6

DATALAKE

STREAMING

GOAL: Provide a common approach to data governance across all systems and data within the enterprise

TransparentGovernance standards and protocols must be clearly defined and available to allReproducibleRecreate the relevant data landscape at a point in timeAuditableAll relevant events and assets but be traceable with appropriate historical lineageConsistentCompliance practices must be consistent

Page 7: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ready for Trusted Governance

OPERATIONS SECURITY

GOVERNANCE

STORAG

ESTO

RAG

E

MachineLearningBatch

StreamingInteractive

Search

GOVERNANCE

YA R ND A T A O P E R A T I N G S Y S T E M

Data Managementalong the entire data lifecycle with integrated provenance and lineage capability

Modeling with Metadataenables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities

Interoperable Solutionsacross the Hadoop ecosystem, through a common metadata store

Page 8: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

DGI* Community becomes Apache Atlas

May2015

Proto-typeBuilt

Apache AtlasIncubation

DGI groupKickoff

Feb2015

Dec 2014

July2015HDP 2.3 FoundationGA Release

First kickoff to GA in 7 months

Global FinancialCompany

* DGI: Data Governance Initiative

Faster & SaferCo-Development driven by customer use cases

Page 9: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas: Metadata Services

• Cross- component dataset lineage. Centralized location for all metadata inside HDP

• Single Interface point for Metadata Exchange with platforms outside of HDP

• Business Taxonomy based classification. Conceptual, Logical And Technical

Apache Atlas

Hiv

e

Ran

ger

Falc

on

Sqoo

p

Stor

m

Kaf

ka

Spar

k

NiF

i

Page 10: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Big Data Management Through Metadata

Management ScalabilityMany traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ?

Metadata Tools

Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels

Tags for Management, Discovery and Security

Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.

Page 11: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas High Level Architecture

Type System

Repository

Search DSL

Brid

ge

Hive Storm

Falcon Others

REST API

Graph DB

Sear

ch

Kafka

SqoopCo

nnec

tors

Mes

sagi

ng F

ram

ewor

k

Page 12: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Technical and Logical Metadata Exchange

Knowledge Store

AtlasREST API

StructuredUnstructured

Files:XML / JSON

3rd Party Vendors

CustomReporter

Non-Hadoop

Page 13: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Near Term Roadmap: Summer 2016

Page 14: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Sqoop

TeradataConnector

ApacheKafka

Expanded Native Connector: Dataset Lineage

Custom Activity Reporter

MetadataRepository

RDBMS

Page 15: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamic Access Policy Driven by metadata

Page 16: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Business Taxonomy UX Prototype

Page 17: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

We conduct open-ended user interviews so that we can learn more about who are users are and what their needs are. This helps us validate whether or not we’re solving the right problem.

User Interviews

Page 18: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

We test our prototype in InVision - a click through prototyping tool that allows users to interact with static mockups.

Usability Testing

Page 19: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

After conducting interviews and usability testing we spend sometime analyzing our findings and pulling out themes + insights.

Synthesis + Analysis

Page 20: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Usability Findings• Understood the hierarchy and how to search for data• Would generally search by file name or specific keyword• Would use tags for the purpose of searching• Would want to preview a subset of the data before

analyzing the whole data set• Interested in the size of the data set• Concerned with how current and updated the information

is• Would like the ability to contact a steward for more

information regarding the data set• Would use an advanced boolean search if it were available• Viewing the popularity and access frequency would

provide confidence• Would like to provide and view fellow user’s input

Page 21: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Persona Findings• Data Scientists typically have backgrounds in Mathematics,

Computer Science and Statistics• Responsible for analyzing and transforming data into more useful

structures• Responsible for correcting missing values, typos and parsing

issues• Typically fluent with SQL, Python and Hadoop tools• Require time upfront to understand and discover new data sets• Spend a significant amount of time reaching out to others with

questions about data sets• Interact with Subject Matter Experts and Solution Architects• Noted that compliance is a big interest for enterprises and

government• Felt Hadoop doesn’t support security and compliance very well• Find it difficult to see who is doing what in Hadoop

Page 22: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Principle Roles• Data Steward – Curator, responsible for catalog verasity• Data Scientist – Analyst, primary consumer of Business

Catalog• Administrator – Role management only• Data Engineer – Data ingress and egress, semantic data

quality

Page 23: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UX proto-type: Taxonomy Navigation

Breadcrumbs for taxonomy context path

Contents at taxonomy context

Page 24: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Taxonomy Creation

In place taxonomy management

Page 25: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Taxonomy Classification of Assets

Create new object on the fly

Page 26: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Object Details

Annotation for policies and rules

Page 27: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Object Lineage

Dataset Lineage across components

Assign Tags to assets

Page 28: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

User Comments

User comments for collaboration

Page 29: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Classify and Tag Assets

Keyword, DSL, and Faceted search

Define authoritive tags for the whole

taxonomy

Page 30: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• Hierarchical Taxonomy Creation• Agile modeling: Model Conceptual, Logical, Physical assets• Authorization: Steward / Analytic Roles• Tag management: Definition and assignment• DQ tab for profiling and sampling• User Comments

Business Taxonomy UX Prototype

What other information would you

like to see included?

Page 31: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Availability: - Tech Preview VMs: May 2016 - GA Release: Summer 2016

Page 32: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions ?

Page 33: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reference

Page 34: Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Online Resources

VM: https://s3.amazonaws.com/demo-drops.hortonworks.com/HDP-Atlas-Ranger-TP.ova —> Download Public Preview VM

Tutorial: https://github.com/hortonworks/tutorials/tree/atlas-ranger-tp/tutorials/hortonworks/atlas-ranger-preview

Blog: http://hwxjojo.wpengine.com/blog/the-next-generation-of-hadoop-based-security-data-governance/ (this is giving an error, right now)

Learn More: http://hortonworks.com/solutions/atlas-ranger-integration/