Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

16
www.spagobi.org Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI and Talend jointly support Big Data scenarios Monica Franceschini - SpagoBI Architect SpagoBI Competency Center - Engineering Group

description

This presentation supported the speech entitled "SpagoBI and Talend jointly support Big Data scenarios" delivered by Monica Franceschini, SpagoBI Architect, during the OW2 track at Solutions Linux 2013 (Paris, 28th-29th May 2013).

Transcript of Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

Page 1: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

SpagoBI and Talend jointly support Big Data scenarios

Monica Franceschini - SpagoBI Architect

SpagoBI Competency Center - Engineering Group

Page 2: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Big-data• Agenda

– Intro & definitions – Layers– Talend & SpagoBI– SpagoBI big-data roadmap

Page 3: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Big Data - 3Vs"Big data" is high-volume, high-velocity and high-variety information assets that

demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas. Gartner, 21 June 2012.

VOLUME The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue

VARIETY IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more.

VELOCITY This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data

is being produced and how fast the data must be processed to meet demand

Gartner Press Release, “Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data”, June 27, 2011

Page 4: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Big Data- 3Vs & more

VARIABILITY

variance in meaning, in lexicon

VERACITY

1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

VALUE

The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.

Page 5: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Big data - Layers• Infastructure

– On-site– IaaS

• Data management:– capture– cleaning– loading– store

• View and Analyse– Text analysis– Text mining– exploration, navigation, presentation

• Application– Cloud– SaaA

ETL

Business Intelligence

Services

Page 6: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Big data & Businessn Intelligence

• Tasks:

– Manage big-data (ETL) Talend→

– Read, interpret and show big-data (BI) SpagoBI→

– Big-data and real-time (BI) SpagoBI→

Page 7: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Talend - Big Data Management

Big Data Production

Big Data Management

Big Data Consumption

Storage ProcessingFiltering

Mining

Analytics

Search

Enrichment

RDBMSAnalytical DBNoSQL DBERP/CRMSaaSSocial MediaWeb AnalyticsLog FilesRFIDCall Data RecordsSensorsMachine-Generated

Big Data Integration

Big Data Quality

Turn Big Data into actionable information

Parsing Checking

Page 8: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Talend Goal: democratize Big Data

…an open source ecosystem

Talend Open Studio for Big Data“Big Data for the Masses”

Improves efficiency of big data job design with graphic interface

Abstracts and generates code Run transforms inside Hadoop Native support for HDFS, Sqoop, HBase,

Mahout, Pig, Hive & MapReduce code generat° Apache License 2.0 Embedded in Hortonworks Data Platform Certifed with Cloudera, MapR and Grenplum

HCatalog

Page 9: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

ETL: Analytical databases & appliances

Connectors from/to:

‗Greenplum

‗Netezza

‗Sybase

‗Teradata

‗VectorWise

‗Vertica

‗HDFS

‗HBase

‗Hive

‗Cassandra

‗MongoDB

Page 10: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

SpagoBI - loadCertified appliances:

‗Teradata

‗VectorWise

Connectors from:

‗Cassandra

‗HBase

‗Hive

‗Impala

‗Hadoop

RT with:

‗Storm

‗WSO2

More:

‗Scheduled data-set

‗In-memory data set

Page 11: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

SpagoBI - meaning

Support for open standards:

‗RDF (Resource Description Framework) http://www.w3.org/RDF/

‗OWL (Web Ontology Language) http://www.w3.org/OWL/

‗R

‗Mahout

‗Text mining

Connectors from:

‗Neo4J

‗Freebase

‗OrientDB

Page 12: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

SpagoBI - show

Explorative front-end

‗Network analysis

‗Exploration

‗In-memory

‗Data visualization

Page 13: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

SpagoBI - roadmap

• Capture / Store– Talend, connector to/from:

• Greenplum• Netezza• Sybase• Teradata• VectorWise• Vertica • HDFS• HBase• Hive• Cassandra• MongoDB• …

• LOAD– Certified appliances:

• Teradata• VectorWise

– Connectors from:• Cassandra• HBase• Hive• Impala• Hadoop• MongoDB

– RT with:• Storm• WS02

– More:• Scheduled data-set• In-memory data set

Page 14: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

SpagoBI - roadmap• Meaning

– Connectors from:• Neo4J• Freebase• OrientDB

– Support for open standards:• RDF• OWL

– Mining• R• MashR• Text mining

• Show– Explorative front-end– Network analysis– Data visualization

• Services– Big data as a service

• Multitenant• Cloud• BI as a service (ad-hoc+self-service)

Data scientist

Page 15: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

Bundle Talend -SpagoBI

The bundle will provide: a distribution of both tools

interacting one with each other a use-case that can be run to explore

their functionalities

SpagoBI and Talend announce their bundle!

Page 16: Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

@twittmonique

[email protected]