Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending,...

29
OCTOBER 18,2016 SAN FRANCISCO BAY AREA, CA #DenodoDataFest RAPID, AGILE DATA STRATEGIES For Accelerating Analytics, Cloud, and Big Data Initiatives.

Transcript of Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending,...

Page 1: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A

#DenodoDataFest

RAPID, AGILE DATA STRATEGIESFor Accelerating Analytics, Cloud, and Big Data Initiatives.

Page 2: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Comparing and Contrasting Data Virtualization with Data Prep, Data Blending, Data Catalog and Other Technologies

Paul Moxon

Head of Product Management, Denodo

Page 3: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Agenda1.Business Intelligence ‘Swim Lanes’

2.Data Prep – What is it and how does it work?

3.All you want to know about Data Blending

4.Data Catalogs – What, When, and How

5.Mapping to the Swim Lanes

6.Where Does Data Virtualization Fit?

7.Q&A

3

Page 4: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Business Intelligence ‘Swim Lanes’

4

• Task focused

• Productivity

• Self-service

• Quick and easy access

to data

• Automation (or

simplification) of data

gathering

• Tactical

• Team/Departmental

• Drives business

operations

• Shared data

• Process oriented

• Strategic

• Executive and KPI

dashboards

• Drives strategic

decisions

• Managed, governed

data

• Consistent data

Page 5: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Prep

What is it and how does it work?

5

Page 6: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data preparation is the process of gathering,

combining, structuring and organizing data so it

can be analyzed as part of business intelligence or

analytics process.

Page 7: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Leading Data Prep Vendors

• Trifacta

• Paxata

• Alteryx

• Datameer

• Talend Data Preparation Desktop

• Informatica Rev

• SAS Data Loader

• IBM Watson

7

Page 8: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

How Does It Work?

Interactive Data Prep process:

1. First data is ingested from data sources (or just a sample of data)

2. The user can define transformations to prepare the data

a. De-duplication, cleansing, combining data, pivoting, splitting rows/columns,

etc.

3. Run the transformation and export the data

a. Local file (typically CSV) or into Hadoop (Hive table or CSV file)

b. Alternatively export to BI Tool (e.g. Tableau Data Extract file)

Operationalize:

1. Schedule data prep transformations to generate new data files (à la ETL)

2. Publish results to collaboration environment

8

Page 9: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Prep Tools

9

Page 10: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Pros:

• Ease of use

• Iterative data transformation

• Very good with delimited files

• Sampling makes tools responsive

• Data profiling help detect ‘suspect’

data

Cons:

• Ad-hoc rather than operational

• Reuse is limited to collaborative data

sets

• Performance

• Consistency and governance – data

chaos?

Pros and Cons of Data Prep

10

Page 11: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Prep is great for ad-hoc discovery and analytics

• “I need to combine this with that and run it through my analytics application…”

Not so good for consistent, repeatable integration

• (Think: BI swim lanes)

But…

• Data Prep provides valuable knowledge that can be used in systematic data integration

Data Prep and Systematic Data Integration

11

Page 12: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Prep and Virtual Sandboxes

12

Page 13: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Blending

13

All you want to know about…

Page 14: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data blending is about working with multiple sources of

data by preparing them and joining them together for a

specific use case at a specific time. It’s different from data

integration, because data blending is about solving a

specific use case, whereas data integration typically gives

you a single source of truth…

Page 15: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Leading Data Blending ‘Vendors’

• Tableau

• Microstrategy

• SAP Business Objects

• IBM Cognos

• Qlik View

• etc.

15

Page 16: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

How Does it Work?

Defining the data blending ‘model’:

1. Connect to data sources

a. Databases, Data Warehouse (via ODBC or JDBC), Files (Excel, CSV, etc.),

Hadoop, NoSQL, etc.

2. Select data you want to use – a sample is usually loaded

3. Build model using graphical tool to create Joins, Unions, etc.

4. Run the model for the full data set

5. Build your report or dashboard

Operationalize:

1. Model can be saved and expose as a ‘data source’ (usually in a ‘server’)

2. Accessed by other users

16

Page 17: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Blending

17

Page 18: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Pros:

• Built into BI/visualization tools

• Graphical query designer

• Provides semantic layer on top of

data sources

• Quick time from ‘data to analysis’ i.e.

removes wait for IT to provision a

data mart or similar

Cons:

• Ad-hoc rather than operational

• Specific to each BI/visualization tool

• Performance

• Consistency and governance

Pros and Cons of Data Blending

18

Page 19: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Francois Ajenstat, Chief Product Officer, Tableau

There are two flows; the ad-hoc and the operational…where we are

coming from is…I just want to integrate these two sources. It's not

formalized, per se, it's not a project. I just want to connect this and

this and I want to analyze it. How do we go from data to analysis as

quickly as possible? And when you want to formalize it, operationalize

it, make it repeatable, then [you use other tools].

19

Page 20: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Catalogs

What, when, and how?

20

Page 21: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Catalogs provide capabilities that enable any user –

from analysts to data scientists to developers – to discover,

understand, and consume data sources. Data Catalogs

typically include a crowdsourcing model of metadata and

annotations, and allow all users to contribute their

knowledge to build a community and culture of data.

Page 22: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Leading Data Catalog Vendors

• Alation/Teradata

• Cambridge Semantics Anzo Platform

• Informatica Enterprise Information Catalog

• Microsoft Azure Data Catalog

• Waterline Data

22

Page 23: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

How Does it Work?

Building catalog:

1. Connect to data sources and consumers

a. Extract and analyze ‘technical’ metadata

b. Sample data and build data profile

2. Use NLP and ML for ‘auto-titling’ – based on defined business glossary

3. Use expert sourcing to validate catalog entries

4. Use crowd sourcing to build veracity profile

Accessing catalog:

1. Search tools for ‘natural language’ searches

2. APIs for tool integration

23

Page 24: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Data Catalog

24

Page 25: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Pros:

• Great for analyzing data source and

inferring meaning from technical

metadata

• Gather ‘tribal knowledge’ about data

within organization

• Allow curation of metadata

• Provide single tool to find – and

understand - data

Cons:

• Do not address ‘data provisioning’ –

you need another tool for this

• File-based data?

Pros and Cons of Data Blending

25

Page 26: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Summary

Back to the swim lanes…

26

Page 27: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Business Intelligence ‘Swim Lanes’

27

Data Blending

Data Catalog

Data Prep

Data Virtualization

Page 28: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Q&A

Page 29: Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

Thank you!

© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A

#DenodoDataFest