Altic's big analytics stack, Charly Clairmont, Altic.
-
Upload
ow2-consortium -
Category
Technology
-
view
923 -
download
3
description
Transcript of Altic's big analytics stack, Charly Clairmont, Altic.
Twitter #ow2con @egwadawww.ow2.org
Our historical tools
• ETL : Talend
• Reporting : JasperReports, Birt
• OLAP : Mondrian, Palo
• BI platform : SpagoBI
Twitter #ow2con @egwadawww.ow2.org
Smart assembling Innovation & customers'needs
● Identify when applied research is an opportunity for us, our solutions and our customers.
● Understand the business process of our customer & assess the impact of Open IT on their activities
● Offer an approach of the project both a technical and a operative
➔ Altic projects
➔ Allows our customer to optimize their business process
➔ Takes the customer job into account
➔ Offers perennial solutions
➔ Follows the customer present needs and not the editors' agenda
Twitter #ow2con @egwadawww.ow2.org
Our first Big Data project at Altic
● eFraudBox project (2010 – 2013)● Goal : predict frauds on Internet● Context :
– Customer : GIE carte bancaire– European Research and Development project– Lot of industrial and academic partners
● Data :– Type : Banking transactions– Volume : One GB per day
Twitter #ow2con @egwadawww.ow2.org
« In data mining processing is done line by line » … [ there's not about a data volume issue ]
Twitter #ow2con @egwadawww.ow2.org
● Open Source
● MPP compute platform
● Distributed file system
● MapReduce processing
● Cost efficient
● Fault tolerant
● Infinite scale
● Enterprise Information System ready
● Continuous Improvement
● Growing community
Let's have a look at Hadoop ?
« Even transactions are possible on Hadoop - it's inevitable that ALL kinds of workloads will move there
in the future »
Doug CUTTINGHadoop Creator
Octobre 2013
Twitter #ow2con @egwadawww.ow2.org
How do we query Hadoop ?
● SQL like● Easy development
● Pig Latin● Easy syntax● Support unstructured data
● Java● Very optimised● Very customisable
Twitter #ow2con @egwadawww.ow2.org
How do we query Hadoop ?
● We already know SQL !
● Why not ?● Need to code evertything
Twitter #ow2con @egwadawww.ow2.org
Ok, we have our storage and computation engine, but how can we
manage data ?
By using our Swiss Army Knife !
Twitter #ow2con @egwadawww.ow2.org
Now our Hadoop / Hive platform is filled with Big Data,
but It's a little bit too slow to query for end users...
http://ih2.redbubble.net/image.13088996.5766/sticker,375x360.png
Twitter #ow2con @egwadawww.ow2.org
Processing data with Hive and store results in fast databases
Aggregate data
Twitter #ow2con @egwadawww.ow2.org
Ok, now we have our fast queryable datasets, but how can we visualize these ?
To manage users and visualizations
To quickly have a vision of your data
To go deeper in your visualizations
Twitter #ow2con @egwadawww.ow2.org
BigData and Datamining v2
● Spark : new InMemory data processing framework
● Very appropriate for Machine learning● MLBase : Machine learning library● Spark-clustering : Implementation of SOM algorithm● Proof Of Concept : Analysis of mobile
telecommunications
Twitter #ow2con @egwadawww.ow2.org
BI & Big Data for Altic
● Eventually, we still do BI as usual● Tools evolve :
– New storage and processing– We do not change our tools, fortunately THEY progress
for us and we contribute● Fundamental does not really change, only
technologies do– Hadoop– Spark
Twitter #ow2con @egwadawww.ow2.org
We improve our Big Data stack and its approach...
And support Big Analytic customer project
Our Big Data Stack Our Big Data Approach
Twitter #ow2con @egwadawww.ow2.org
Questions ?
Thanks !
Charly CLAIRMONTCTO at ALTIC
http://altic.org