The Role of the Data Catalog in Delivering Value from ... of the Data... · The Role of the Data...
Transcript of The Role of the Data Catalog in Delivering Value from ... of the Data... · The Role of the Data...
The Role of the Data Catalog in Delivering Value from Enterprise Data
&
Introducing: Waterline Smart Data Catalog 4.0
Connecting the Right People to the Right Data
Matt Aslett, Research Director
Todd Goldman, Chief Marketing OfficerMohan Sadashiva, VP, Product ManagementAndrew Ahn, Sr. Director, Product Management
Agenda
Topic Presenter
The Role of the Data Catalog in Delivering Value from Enterprise Data
Matt Aslett, Research DirectorData Platforms & Analytics, 451 Research
Waterline Overview Todd Goldman, CMOWaterline Data
Waterline Smart Data Catalog 4.0 Demo Andrew Ahn, Sr. Director, Product ManagementWaterline Data
Q & A All
Copyright(C)2017451ResearchLLC
The role of the data catalog in delivering value from enterprise data
MattAslett,ResearchDirector
Copyright(C)2017451ResearchLLC
451 Research is a leading IT research & advisory company
4
Founded in 2000
300+ employees, including over 120 analysts
2,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers
50,000+ IT professionals, business users and consumers in our research community
Over 52 million data points published each quarter and 4,500+ reports published each year
3,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group
Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
Copyright(C)2017451ResearchLLC
Digital Transformation: What do we mean?
5
WhenITinnovationisalignedwithanddrivenbyawell-plannedbusinessstrategyto:
§ transformhoworganizationsservecustomers,employees,andpartners§ supportcontinuousimprovementinbusinessoperations§ disruptexistingbusinessesandmarkets§ inventnewbusinessesandbusinessmodels
3Copyright(C)2017451ResearchLLC
Copyright(C)2017451ResearchLLC
Key Takeaways for Digital Transformation
Copyright(C)2017451ResearchLLC
What Exactly Needs to be Transformed Digitally?
ProcessTransformation Information
Transformation
§ Morethanatechnicalshift,butaculturalone§ Focusoncollaboration—employeesbutalso
customers,partners,suppliers§ Agilemethodsforsoftwaredevelopment
ProcessTransformation
Copyright(C)2017451ResearchLLC
What Exactly Needs to be Transformed Digitally?
ProcessTransformation
PlatformTransformation
§ Morethanatechnicalshift,butaculturalone§ Focusoncollaboration—employeesbutalso
customers,partners,suppliers§ Agilemethodsforsoftwaredevelopment
§ ITmovingfromcostcentertosoftwareenabler§ Organizationsneedssystemsofengagement—tools
andsystemsforomnichannel interaction§ Integrationwithlegacysystemsofrecord
Copyright(C)2017451ResearchLLC
What Exactly Needs to be Transformed Digitally?
ProcessTransformation
InformationTransformation
PlatformTransformation
§ Morethanatechnicalshift,butaculturalone§ Focusoncollaboration—employeesbutalso
customers,partners,suppliers§ Agilemethodsforsoftwaredevelopment
§ Gatheringdataandlotsofitinvariousmeansandmethods
§ Multiplecommunicationpointsonmultipledevices
§ Leveragingdatawithadvancedanalytics
§ ITmovingfromcostcentertosoftwareenabler§ Organizationsneedssystemsofengagement—tools
andsystemsforomnichannel interaction§ Integrationwithlegacysystemsofrecord
Copyright(C)2017451ResearchLLC
The Critical Role of Data – and the Data Catalog
10
• If you don’t know where the data is located, you can’t affect business outcomes.
• But if those that own data cannot find it, then ownership of it won’t be enough to succeed.
• Those that own the data will win, everyone else will have to pay for access to it.
Copyright(C)2017451ResearchLLC
Data to Insight - Traditional Approach
11
Iconsbyhttp://dryicons.com
Application
ETL
DataWarehouse
Analytics
Copyright(C)2017451ResearchLLC
Data to Insight - Traditional Approach
12
Iconsbyhttp://dryicons.com
Copyright(C)2017451ResearchLLC
Data to Insight - Traditional Approach
13
Source: https://www.flickr.com/photos/wbaiv/16510090506/
Schema-On-Write• High-performanceanalytics
forpre-definedqueries
• Inflexibletochange
• Inflexibletonewdata
• Pre-prepared(byIT)
Iconsbyhttp://dryicons.com
Copyright(C)2017451ResearchLLC
Data to Insight – Emerging Approach
14
Iconsbyhttp://dryicons.com
Source: https://www.flickr.com/photos/notbrucelee/5696238930/
Schema-On-Read• High-performancedataprocessing
• Flexibletochange
• Flexibletonewdata
• Multi-purpose
Copyright(C)2017451ResearchLLC
Data to Insight – Emerging Approach
15
Iconsbyhttp://dryicons.com
Copyright(C)2017451ResearchLLC
Hadoop-based Data Lakes• ApacheHadoopservingasa
unifiedrepository
• Intowhichrawdataislandedfrommultiplesources
• Andmadeavailabletomultipleusersformultiplepurposes.
16
Photo: Myrabella / Wikimedia Commons, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11263585
Copyright(C)2017451ResearchLLC
Hadoop-based data lakes• ApacheHadoopservingasa
unifiedrepository
• Intowhichrawdataislandedfrommultiplesources
• Andmadeavailabletomultipleusersformultiplepurposes.
• Bewarethedataswamp
17
https://www.flickr.com/photos/lofink/4501610335/
Copyright(C)2017451ResearchLLC
Data Governance, Data Preparation and the Data Lake• Ifyoudon’tknowwherethedataislocated,you
can’taffectbusinessoutcomes.
• Dataneedstobemanagedtomakeitsuitableformultipleanalyticsusecases.
• Datagovernance• Datacatalog• Datasecurity• Datalineage
• Datapreparation• Datadiscovery• Datacleansing• Dataharmonization 18
• Datainventory• Dataquality• Datapipelines
• Dataenrichment• Datamatching• Collaboration
Copyright(C)2017451ResearchLLC
Data Governance and the GDPRGeneralDataProtectionRegulation(GDPR)iscausingmanycompaniestoreconsidertheirdatagovernancecapabilities.• SucceedstheEUDataProtectionDirectiveonMay25,2018• AppliestotheprocessingofpersonaldatafromabusinessactivityintheEU
(whethertheprocessingoccursinsideoroutsidetheEU)• Requiresorganizationstoreporttoonedataprotectionauthorityineachmembercountry.• GivesEUresidentsmorecontroloftheirpersonaldata.• Theabilitytoprohibitdataprocessingbeyonditsspecifiedpurposeforcollection• Therighttobeforgotten• Theabilitytowithdrawconsenttothecollectionanduseofpersonaldata
• ViolatingtheGDPRcancostupto€20m($21.75m)infines,orupto4%ofthetotalannualworldwideturnoveroftheprecedingfinancialyear.
19
Copyright(C)2017451ResearchLLC20
Data Catalog: Part of a Virtuous Circle
PREPARATIONTransform, cleanse,
tame, match, tag. Collaborate.
INVENTORYCatalog of all data
within the data estate.
DISCOVERYSelf-service
identification of data sources.
GOVERNANCEManagement of data and
metadata through its lifecycle.
AUTOMATIONMachine learning
driven recommendations.
Copyright(C)2017451ResearchLLC
Data Governance, Data Preparation and the Data Lake
21
DATA-AS-A-SERVICE
PARTNERS
SUPPLIERS
SELF-SERVICEDATAPREPARATION
IT
DATALAKE
APPLICATIONS
DATAGOVERNANCEDatalineage
DatainventoryDatacatalogDatasecurity Dataquality
Datapipelines
DATASTEWARDS
DatacleansingDataharmonization
Datadiscovery
Collaboration
DatamatchingDataenrichment
ADVANCEDANALYTICS
DATASCIENTISTS
SELF-SERVICEANALYTICS
SENIOREXECUTIVES BUSINESSANALYSTS DATAANALYSTS
Copyright(C)2017451ResearchLLC
Benefits of Data Catalog-driven Data Lake Management
22
• Data lineage enables analysts and data stewards to understand where the data came from, and what transformations may have been made to it already, and is also also vital to compliance projects.
• Collaboration among users, along with machine-learning driven automation and recommendations, expedites the time to value, enabling analysts to access trusted data sets and pre-queried results.
• Reduces the overhead for users to discover, integrate, cleanse and enrich data, which makes it possible for users to expand the scope of their analysis – accessing more data sets and greater volumes of data.
• Reduces the burden on IT to prepare data for end users, and in doing so reduce the time taken for users to discover, integrate, cleanse and enrich data to make it suitable for analysis.
• Data catalog and data governance are fundamental enablers of self-service data preparation, and a functioning data lake.
Copyright(C)2017451ResearchLLC
Thank [email protected]@maslettwww.451research.com
Introducing:
Waterline Smart Data Catalog 4.0
Todd Goldman, CMO
Andrew Ahn, Sr Director of Product Management
If you don’t know where the data is located, you can’t affect business outcomes Matt Aslett
Risk Management & Compliance
Self Service Discovery & Provisioning for Analytics
Cost Savings & Efficiency
Where do I find the data I need to complete my business analysis?
How to I organize all the data the business analysts need?
Where is sensitive data? Where did it come from? Who should have access?
Where is redundant data located and how much can I eliminate?
Waterline Smart Data Catalog: Answers the Key Questions
• Automatically discovers, organizes, tags and curates data & makes it available to search and find
• Consolidates and tracks data lineage
• Profiles data, presents statistics, presents crowdsourced ratings
• Identifies sensitive data and enables tag based access control
Where do I find the data?
Where did it come from?
What is in the data?
Who can use the data?
Connect the Right People to the Right Data
Risk Management & Compliance
Self Service Discovery & Provisioning
Cost Savings & Efficiency
Inventory data and enable users to search, find and use data across all data sets
Demonstrate auditable data lineage. Get compliant data quickly into use. React quickly to regulatory requests.
Identify redundant databases, marts, tables & schemas to eliminate or move
Waterline Addresses the Impact of Dark Data:
Data Lifecycle Management
Discover Catalog• Analyze• Secure• Rationalize
• Initiative:• Optimize the quality and cost of customer credit score services
• Problem:• Company purchases data and provides credit scores to customers.
• Inaccurate data costs money and jeopardizes customer loyalty
• New data sources take months to release
• Why Waterline:• Identifies redundant data to limit the amount and cost of data sets
• Identifies the most relevant data to provide customers which
improves customer retention & increases revenue
• Reduces time from data acquisition of to release to customers
from months to hours
European Credit Rating Agency
• Initiative:• Deploy access control for thousands of datasets ingested into Data Lake
• Accelerate responsiveness to evolving governmental regulation
• Reduce redundancy in Teradata system
• Problem:• Regulatory & Business reporting: takes too long, not flexible, needs
forward looking insights
• Must comply with GDPR and create a process to “forget” people
• Why Waterline:• Engineered from a Hadoop point of view
• Integrated directly into Apache Ranger
• Ability to involve user community to utilize data assets (Tagging)
European Bank
Demo
Organize Curate
• Accept or reject tags
• Search and use data through GUI and integration to 3rd
party applications
• Crowdsource ratings and annotations to collaborate and share “tribal data knowledge”
• Automates data access control via tag based security
Data Professionals
A Unique Combination of Automation and Crowdsourcing
• Automatically and incrementally “fingerprint” data at scale by analyzing actual data
• Automatically tag (match) data fingerprints to glossary terms
• Match the unmatched terms through crowdsourcing
Discover
Business Professionals
Search | Rate | Collaborate
System learns and fine tunes
matching algorithm
Tag-based security integration with data security systems
Smart Data Catalog R
EST
AP
I
Data Security
Tags
Discover
Catalog
Search• Reduce time data spends in
quarantine from weeks to hours
• Automatically assigns rights to data for use by role
• Tags automatically discovered by Waterline can be passed directly to data security systems to be used for access control
• Increases useful life of data
• Sentry• Ranger • Etc through
API
Open, Extensible Architecture
Execution Environments
Smart Data Catalog
Data Sources
TeradataOracleMySql
OtherRelational
SparkHDFS/Hive
Amazon S3Microsoft Azure
Rel
atio
nal
P
lugi
n A
rch
Analytics Environments
BI/Analytics Wrangling Other Apps
Search REST API
Business Glossaries
Met
adat
a R
EST
AP
I
ETL
Data Security
Discover
Catalog
Search
www.waterlinedata.com
Smart Data CatalogData Professionals Business Professionals
Search | Rate | CollaborateDiscover | Organize | Curate
• Automate Discovery & Search for Analytics
• Mitigate Data Compliance Penalties
• Reduce costs due to data redundancy
Q&A