BIG DATA TECHNOLOGY: AND PLATFORMS -...

73

Transcript of BIG DATA TECHNOLOGY: AND PLATFORMS -...

Page 1: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data
Page 2: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIG DATA TECHNOLOGY:

INFRAESTRUCTURE

AND PLATFORMS

Page 3: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Sra. Gemma Batlle

Business Development Manager Eurecatwww.eurecat.org

@eurecat_events

Page 4: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIG DATA TECHNOLOGY:

INFRAESTRUCTURE

AND PLATFORMS

Page 5: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Sr. Marc Planagumà & Sr. Jose Luis Sánchez

BigData Platform Managers

ServiZurich

www.zurich.es

Page 6: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

INTERNAL USE ONLY

Build Enterprise Data Lake without Drowning

26/10/2017

Jose Luis Sanchez and Marc Planagumà

Big Data Congress Barcelona 2017

ServiZurich – Big Data Delivery Center - EDAA

Page 7: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

INTERNAL USE ONLY

What is a Data Lake?

A data lake is a method of storing data within a system or repository, in its natural

format (Structured or Unstructured).

The idea of a data lake is to have a single store of all data in the enterprise ranging

from raw data to transformed data.

…and the main goals of an Enterprise Data lake?

Page 8: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY 8

Break the Silos

Page 9: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY 9

Bring togetherStructured and UnstructuredData

Page 10: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Enable Big Data Analytics

10

Page 11: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

INTERNAL USE ONLY

What does a Data Lake in an Enterprise mean?

The Main Management Challenges

Page 12: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

BI and Big Data Coexistence

12

Business

Intelligence

Data Warehouse Data Lake

Data Analytics

Big Data

Big Data Analytics

Page 13: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Open up space for Open Source

13

Vs.

Page 14: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Disruption on Infrastructure Strategies

14

Page 15: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Fast Acceleration in Technology Life Cycle

15

Page 16: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

INTERNAL USE ONLY

How to build a Data Lake in an Enterprise?

The Main Technology Challenges

Page 17: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Corporate Tools Stack Set-up

17

Page 18: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Big Data Platform into Production

18

Development Chain

SLAs

High Availability

Disaster Recovery

Support

Automation

Page 19: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Enterprise Platform Enablement

19

Security

Processes

IntegrationReports

Monitoring

Page 20: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

© Z

uri

ch

INTERNAL USE ONLY

Data Science framework in Production

20

Solution

Developers

Platform

Engineers

Data

Scientists

Page 21: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

INTERNAL USE ONLY

Enjoying the Lake to Swim on Data

Thanks!

Page 22: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIG DATA TECHNOLOGY:

INFRAESTRUCTURE

AND PLATFORMS

Page 23: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Sr. Víctor Dertiano

Senior Manager

BI Geekwww.bi-geek.com

Page 24: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

2getherbank

Arquitectura de una plataforma financiera

Page 25: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIGDATACONGRESS

1. Qué es 2getherbank

2. Arquitectura – Plataforma Informacional

Page 26: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Qué es 2getherbank

Page 27: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data
Page 28: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Arquitectura – Plataforma Informacional

Page 29: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 30: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 31: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 32: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 33: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 34: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 35: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 36: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 37: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 38: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 39: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 40: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 41: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 42: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 43: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 44: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

ArquitecturaPlataforma Informacional

Page 45: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Muchas gracias

Page 46: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIG DATA TECHNOLOGY:

INFRAESTRUCTURE

AND PLATFORMS

Page 47: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Sr. Albert Climent

Senior Data Scientist

Pervasive Technologies www.pervasive-tech.com

Page 48: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Google Cloud Platform for Big Data

Albert Climent Bigas

Senior Data Scientist – Pervasive Technologies

26 Octubre 2017

Page 49: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

TABLE OF CONTENTS

• Problemática

• Big Data lifecycle

• Infraestructura Big Data

• Soluciones

• Google Cloud Platform

Page 50: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

PROBLEMÁTICA

VOLUMEN

VARIEDADVELOCIDAD

• Terabytes

• Transiciones

• Tablas, ficheros

• Registros

• Batch

• Near Time

• Real Time

• Streams

• No estructurado

• Semiestructurado

• Estructurado

• Combinado

3 V’s de

Big Data

Page 51: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

BIG DATA LIFECYCLE

Data Acquisition

Data Preparation

Data Representation

Data Analysis

Data Interpretation

Page 52: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

INFRAESTRUCTURA BIG DATA

Equipo de IT dedicado al mantenimiento No es necesario un equipo de IT dedicado

Acceso limitado a los dispositivos Acceso desde cualquier ubicación (internet)

Actualizaciones y mejoras de software limitadas Actualizaciones y mejoras continuas

Alto coste inicial y de renovación de los equipos Costes limitados al uso

Riesgo de pérdida de datos gestionado Bajo riesgo de pérdida de datos

Page 53: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

SOLUCIONES On premise

Page 54: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

SOLUCIONES Cloud

Page 55: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

GOOGLE CLOUD PLATFORM

Page 56: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

GOOGLE CLOUD PLATFORM

Page 57: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

GOOGLE CLOUD PLATFORM

VARIEDAD

VELOCIDAD

VOLUMEN

VOLUMEN

VELOCIDAD

VELOCIDAD

Page 58: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

www.pervasive-tech.com

Albert Climent Bigas – [email protected]

GRACIAS!

Page 59: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIG DATA TECHNOLOGY:

INFRAESTRUCTURE

AND PLATFORMS

Page 60: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

Sr. Oscar RomeroProfesor del departamento de Ingeniería de Servicios y

Sistemas de Información de la UPC y miembro del grupo

de investigación en Database Technologies and

Information Management

Universitat Politècnica de Catalunyawww.essi.upc.edu/dtim

@romero_m_oscar

Page 61: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

Evolución de losEcosistemas de DatosDel Data Warehouse al Data LakeO S C A R R O M E R O

D T I M R E S E A R C H G R O U P ( H T T P : / / W W W. E S S I . U P C . E D U / D T I M / )

U N I V E R S I TAT P O L I T È C N I C A D E C ATA L U N YA - B A R C E L O N AT E C H

Page 62: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

What is Big Data?

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 62

VOLUME

vArIaBiLiTy VarietyVelocity Value

Veracity

Page 63: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

Today, the Focus is on Variety

OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 6326-10-2017

That Big Data is synonymous with large volumes of data is a myth

“Rather, it is the ability to integrate more sources of data than ever before — new data, old data, big data, small data, structured data, unstructured data, social media data, behavioral data, and legacy data”

The Variety Challenge

MIT Sloan Management Review (2016): http://sloanreview.mit.edu/article/variety-not-volume-is-driving-big-data-initiatives/

Page 64: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

The Long Tail of Big DataThe ultimate goal is…◦ Integrate new data sources on-demand,

◦ Legacy Systems

◦ External Data (typically, semi-structured or unstructured data)

◦ Social Media and Behavioural Data Sources

◦ Provide the required flexibility for conducting on-demand data analysis techniques◦ Data preparation

«Data is the new oil!» - Clive Humby, 2006«No! Data is the new soil» - David McCandless, 2010

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 64

Value

Page 65: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

Model-First (Load-Later)

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 65

Twitter API (JSON)

In-house DB(PostgreSQL)

Web Logs(Logs)

USER FEEDBACK PRODUCT INFOUSER WEB

BEHAVIOUR

- Product- Product features

- User- Tweet- Date- Location

- User - Product- Landing

time- Visits ts

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Feature- Avg(sentiment)

Is part of

Assesses

Interested In

Sentiment Analysis (e.g., Text Mining)

Log Analysis (e.g., Process Mining)

Product homogenization (e.g., duplicate detection)

- Avg (sentiment)- Keen: Avg(landing time)/#visits

Page 66: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

Drawbacks

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 66

Twitter API (JSON)

In-house DB(PostgreSQL)

Web Logs(Logs)

USER FEEDBACK PRODUCT INFOUSER WEB

BEHAVIOUR

- Product- Product features

- User- Tweet- Date- Location

- User - Product- Landing

time- Visits ts

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Feature- Avg(sentiment)

Is part of

Assesses

Interested In

Sentiment Analysis (e.g., Text Mining)

Log Analysis (e.g., Process Mining)

Product homogenization (e.g., duplicate detection)

- Avg (sentiment)- Keen: Avg(landing time)/#visits

Permanenttransformations

Fixed Target Schema

High EntryBarriers

Page 67: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

Load-First Model-Later

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 67

Twitter API (JSON)

Web Logs(Logs)

USER FEEDBACK PRODUCT INFO USER WEB BEHAVIOUR

In-house DB(PostgreSQL)

Data Lake

Analyst 1

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Feature- Avg(sentiment)

Is part of

AssessesAnalyst 2

Data Views

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Interested In

- Avg (sentiment)- Keen: Avg(landing

time)/#visits

Page 68: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

Drawbacks

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 68

Twitter API (JSON)

Web Logs(Logs)

USER FEEDBACK PRODUCT INFO USER WEB BEHAVIOUR

In-house DB(PostgreSQL)

Data Lake

Analyst 1

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Feature- Avg(sentiment)

Is part of

AssessesAnalyst 2

Data Views

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Interested In

- Avg (sentiment)- Keen: Avg(landing

time)/#visits

Data Swamp

(Automated)ComplexTransformations

Page 69: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

From Data Swarms to Semantic Data Lakes

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 69

Twitter API (JSON) Web Logs

(Logs)

USER FEEDBACK PRODUCT INFO USER WEBBEHAVIOUR

In-house DB(PostgreSQL)

Data Lake

Assesses- Product- Product features

- User- Tweet- Date- Location

- User - Product- Landing

time- Visits ts

Catalog

File 1 File 2 File 3

Analyst 1

Analyst 2

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Feature- Avg(sentiment)

Is part of

Data Views

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Interested In

- Avg (sentiment)- Keen: Avg(landing

time)/#visits

Page 70: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

DTIM

From IT-Centered to User-Centered

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 70

Twitter API (JSON) Web Logs

(Logs)

USER FEEDBACK PRODUCT INFO USER WEBBEHAVIOUR

In-house DB(PostgreSQL)

Data Lake

Assesses- Product- Product features

- User- Tweet- Date- Location

- User - Product- Landing

time- Visits ts

Catalog

File 1 File 2 File 3

AUTOMATIC DATA GOVERNANCE

Analyst 1

Analyst 2

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Feature- Avg(sentiment)

Is part of

User- Avg rating- List of preferences

Product- Popularity- Top feature- Bottom feature

Interested In

- Avg (sentiment)- Keen: Avg(landing

time)/#visits

Data Views

Page 71: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

26-10-2017 OSCAR ROMERO - EVOLUCIÓN DE LOS ECOSISTEMAS DE DATOS 71

Thanks! Any Question?OROMERO@ESSI .UPC.EDU

HOMEPAGE: HT TP://WWW.ESSI .UPC.EDU/DTIM/PEOPLE/OROMERO

TWIT TER: @ROMERO_M_OSCAR

DTIM

Page 72: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data

BIG DATA TECHNOLOGY:

INFRAESTRUCTURE

AND PLATFORMS

Page 73: BIG DATA TECHNOLOGY: AND PLATFORMS - cdn.bdigital.orgcdn.bdigital.org/PDF/BigDataCongress2017/1.BIGDATATECH_INFRA… · What is a Data Lake? A data lake is a method of storing data