Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Datenanalysen auf Enterprise Niveau mit Oracle R Enterprise
Dr. Nadine Schöne Sales Consultant Oracle Direct, Sales Consulting Dr. Michael Haupt Tech Lead, FastR Project Virtual Machine Research Group, Oracle Labs Negib Marhoul Leading Senior Sales Consultant Oracle Direct, Sales Consulting
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
4
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
Datenanalysen im Enterprise
R und Oracle R Enterprise (ORE)
Demo
Oracle Labs und FastR
Weitere Informationen
1
2
3
4
5
5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hintergrund Statistik und Mining Verfahren
Zeitaufwendige
Analyseprozesse
Mehrere Interationen
Workflows von immer wiederkehrenden Arbeitsschritten
Ressourcen-intensive Datenanalysen
Daten sammeln
Daten
identifizieren
Daten aufbereiten
Daten analysieren
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Wichtige Themen für Enterprise Data Analytics
1. Skalierbarkeit
2. Performance
3. Entwicklung & Produktion
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R und Oracle R Enterprise (ORE)
10
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R ist …
1. Eine Programmiersprache
2. Eine statistische Workbench
3. Ein Data Science Ökosystem
R ist die lingua franca für Data Science.
R logo © R Foundation, vonhttp://www.r-project.org
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Aspekte herkömmlicher R/Datenbank-Interaktion
12
R logo © R Foundation, vonhttp://www.r-project.org
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R Engine andere R-Packages
Oracle R Enterprise Packages
User R Engine (Desktop)
1
User-Tabellen
Oracle DB SQL
Ergebnisse
Datenbank Compute Engine 2 R Engine andere
R-Packages
Oracle R Enterprise Packages
R Engine(s) verwaltet durch Oracle DB
R
Ergebnisse
3
Transparency Layer => Nutzung der Rechenkraft der Datenbank Kein Flat File Export => Zeitersparnis + Nutzung der Rechenkraft des Servers
„Collaborative Execution“-Modell
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
“R is a powerful and interesting tool for data analysis! ORE brings R into a scalable DB engine (solving problems of data management, analysis and scalability). We actually can obtain information and added value from not so actively used data.”
– Stefano Alberto Russo, Researcher at CERN Openlab
14
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
• Oracle R Distribution
• ROracle
• Oracle R Enterprise
• Oracle R Advanced Analytics for Hadoop
Kostenlos für die R Community
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Enterprise auf einen Blick
Function push-down – Datentransformation & Statistiken
R workspace console
Oracle statistics engine
OBIEE, Web Services
Unveränderte User Experience
Skalierbar auf große Datenmengen
Einbettung in operationale Systeme
©2014 Oracle – All Rights Reserved
Entwicklung Produktion Anwendung
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse I
17
200.000 Haushalte
3 Jahre
1 Messung/Stunde
5.256 Mrd. Messwerte (2.628 Messwerte/Kunde)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse II
18
10 s/Modell
200.000 Haushalte ➔
200.000 Modelle
23 Tage + 4 Stunden 4,3 Stunden
Oracle R Enterprise
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integration Data Miner mit Oracle R Enterprise
SQL Query node
– Erlaubt die Integration von R Skripten
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
• Data Understanding & Visualization – Summary & Descriptive Statistics – Histograms, scatter plots, box plots, bar charts – R graphics: 3-D plots, link plots, special R graph types – Cross tabulations – Tests for Correlations (t-test, Pearson’s, ANOVA) – Selected Base SAS equivalents • Data Selection, Preparation and Transformations – Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple schemas – Sampling techniques – Re-coding, Missing values – Aggregations – Spatial data – R to SQL transparency and push down • Classification Models – Logistic Regression (GLM) – Naive Bayes – Decision Trees – Support Vector Machines (SVM) – Neural Networks (NNs) • Regression Models – Multiple Regression (GLM) – Support Vector Machines
Wide Range of In-Database Data Mining and Statistical Functions
Clustering – Hierarchical K-means – Orthogonal Partitioning – Expectation Maximization
Anomaly Detection – Special case Support Vector Machine (1-Class SVM)
Associations / Market Basket Analysis – A Priori algorithm
Feature Selection and Reduction – Attribute Importance (Minimum Description Length) – Principal Components Analysis (PCA) – Non-negative Matrix Factorization – Singular Vector Decomposition
Text Mining – Most OAA algorithms support unstructured data (i.e. customer
comments, email, abstracts, etc.) Transactional Data
– Most OAA algorithms support transactional data (i.e. purchase transactions, repeated measures over time)
R packages—ability to run open source – Broad range of R CRAN packages can be run as part of database
process via R to SQL transparency and/or via Embedded R mode
* included in every Oracle Database
Data Understanding & Visualization
Classification & Regression Models
Clustering
Run open source R packages
Data Preparation and Transformations
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R 3.1.1
Oracle R Enterprise (ORE) 1.4.1
Oracle DB 12.1.0.2.0
R, SQL
Software-Komponenten im VM-Image
Oracle SQLDeveloper 4.0.3 Rstudio 0.98.1079
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25
Safe Harbor Statement
The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Mission of Oracle Labs is straightforward: Identify, explore, and transfer new technologies that have the potential to substantially improve Oracle's business.
– Edward Screven, Chief Corporate Architect, Oracle
26
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Überlegungen zu R
• R eignet sich hervorragend für statistische Aufgaben. Warum sollte man C und Fortran verwenden?
• R ist als Sprache inhärent parallel. Warum sollte man Parallelität extra implementieren?
27
Library'2(R'+'Fortran)
Library'1(R'+'C)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
FastR
• Open-Source-R-Implementierung
– GPL 2
– https://bitbucket.org/allr/fastr
– Forschungsprototyp
– Linux, Mac
• Eigenschaften
– In “100 % Java” implementiert
– Mit Truffle (Interpreter) und Graal (dynamischer Compiler)
28
Library'2'(R)
Library'1'(R)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Truffle und Graal
29
Node%Transi, ons:Specializing%for%Types
Unini, alized
Generic
AST$InterpreterUnini- alized$Nodes
AST$InterpreterRewri. en$Nodes Compiled)Code
Deop%miza%onto,AST,Interpreter
Node%Rewri*ng%to%UpdateProfiling%Feedback
Node%Rewri*ngfor%Profiling%Feedback
Compila( on*usingPar( al*Evalua( on
Recompila*on,usingPar*al,Evalua*on
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benchmark-Ergebnisse: Shootout
• Benchmark-Eigenschaften
– “Computer Languages Shootout Game”
– Keine typischen R-Anwendungen
• Ergebnisse – Achtung, logarithmische Achse
– Die meisten sind ca. 10x schneller
– Positive Ausnahme: ca. 520x
30
1
10
100
1000
Geometric mean: 10x improvement over GNU R
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
PGX: Überblick
PGX ist ein Framework zur Datenanalyse, das mächtige Graphen-Analysen der Daten unterstützt
Recommendation Influencer
Identification
Community Detection
Pattern Matching
PGX führt schnelle und parallele Analysen auf großen Graphen aus – sowohl auf einer einzelnen Maschine als auch in einer verteilten Umgebung.
PGX ist eng integriert mit der Oracle DB (Optionen RDF und PG), welche Graphdaten auf persistentem Speicher konsistent verwaltet.
PGX
… Single Machine Distributed
Graph
Program (DSL)
compiler
Unsere DSL-Compiler-Technologie erlaubt einfaches Umschalten zwischen zwei Umgebungen.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr Informationen
33
ORE Discussion Forum: https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r
Oracle Advanced Analytics: http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html
ORE-Blog: https://blogs.oracle.com/R/
FastR: https://bitbucket.org/allR/fastR
Graal/Truffle: https://wiki.openjdk.java.net/display/Graal/Main
Oracle Labs im OTN: http://www.oracle.com/technetwork/oracle-labs/index.html
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Kontakt
Dr. Nadine Schöne| Sales Consultant
Email: [email protected]
Tel: +49 331 200 7190
Dr. Michael Haupt | Tech Lead, FastR Project
Email: [email protected]
Tel: +49 331 200 7277
ORACLE Deutschland B.V. & Co. KG
Schiffbauergasse 14
14467 Potsdam