Bigdata : Big picture

37
ZEKERIYA BEŞIROĞLU BILGINC IT ACADEMY ORACLE CLOUD DAY 19-11-2015 TROUG-TURKISH ORACLE USER GROUP BIG DATA : BIG PICTURE

Transcript of Bigdata : Big picture

Page 1: Bigdata : Big picture

ZEKERIYA BEŞIROĞLUBILGINC IT ACADEMYORACLE CLOUD DAY19-11-2015TROUG-TURKISH ORACLE USER GROUP

BIG DATA : BIG PICTURE

Page 2: Bigdata : Big picture

ZEKERIYA BEŞIROĞLU▸ +18 IT

▸ +15 ORACLE DB&DWH

▸ +3 BIG DATA

▸ Leader of TROUG

▸ Instructor&Consultant

▸ http://zekeriyabesiroglu.com

▸ @zbesiroglu

TROUG BIG DATA BIG PICTURE

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 3: Bigdata : Big picture

TROUG HABERLER 2015 WWW.TROUG.ORG

Page 4: Bigdata : Big picture

BILGINC IT ACADEMY WWW.BILGINC.COM

Page 5: Bigdata : Big picture

METIN

BIG DATA

Social networksBanking and financial servicesE-commerce servicesWeb-centric servicesInternet search indexesScientific and document searchesMedical recordsWeb loggs

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 6: Bigdata : Big picture

METIN

BIG DATA

▸ VOLUME▸ VELOCITY▸ VARIETY

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 7: Bigdata : Big picture

FIRMALAR , MÜŞTERILERININ DNA SINI ANALIZ ETMEK ZORUNDALAR.

Zekeriya Beşiroğlu

TROUG

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 8: Bigdata : Big picture

TROUG

BIG DATADA HEDEF NEDİR? NASIL YAPILMALI?▸ Big data teknolojilerini kullanarak business’a nasıl

değer katabilirim. Bir takım costları azaltabilirmiyim?▸ Big Data ile geleneksel database nasıl entegre

edeceğim? Structured,semi structured ve unstructured verileri birleştirme

▸ Analytics toolları ile sonuça ulaşma. Oracle Advance Analytics,BI ve DW teknolojileri

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 9: Bigdata : Big picture

TROUG

DATA

▸ Schema on Write yapıyoruz

▸ Schema on READ yapalım.

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 10: Bigdata : Big picture

TROUG

BIG DATA PROJESI SAFHALARI

▸ DATA ACQUISITION and Storage▸ DATA ACCESS and Processing▸ Data Unification and Analysis

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 11: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

HADOOP DISTRIBUTED FILE SYSTEM-HDFS

▸ petabyte-scale distributed file system▸ linearly scalable on commodity hardware▸ Schema on Read▸ Cheaper▸ low security▸ write once,read many

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 12: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

HADOOP DISTRIBUTED FILE SYSTEM-HDFS

▸ Basic file system operations

▸ JSON log file HDFS yükleyebilirim. (hadoop fs -put)

Page 13: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

WHAT IS FLUME?

▸ Avro Source▸ Memory Channel▸ HDFS Sink

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 14: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

ORACLE NOSQL DATABASE

▸ Key Value Database▸ Access by java Apı▸ Stores unstructured or semi structured data as byte

arrays▸ Highly reliable▸ Scalable throughput and predictable latency

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 15: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

RDBMS & NOSQL

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 16: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

HDFS & NOSQL

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 17: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

APPLICATION DATABASE TECHNOLOGY

▸ High Volume with Low value▸ Dynamic application schema

▸ if answer yes NOSQL

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 18: Bigdata : Big picture

DATA ACQUISITION AND STORAGE

NOSQL EXAMPLE

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 19: Bigdata : Big picture

DATA ACCESS AND PROCESSING

MAP REDUCE

▸ Write applications that process vast amounts of data , in parallel on large cluster of commodity hardware in reliable and fault tolerant.

▸ Storing data in HDFS is low cost , fault tolerant and scalable.

▸ Integrates with HDFS to provide parallel data processing

▸ Batch-oriented

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 20: Bigdata : Big picture

DATA ACCESS AND PROCESSING

MAP REDUCE ORNEKmap(String input_key, String input_value)foreach word w in input_value:emit(w, 1)reduce(String output_key, Iterator<int> intermediate_vals) set count = 0 foreach v in intermediate_vals: count += vemit(output_key, count)

(1000,’Galatasaray sampiyon olur’)(2000,’beşiktas sampiyon olur’)(2200,’Galatasaray Türkiyedir’)(3000,’fenerbahce sampiyon olur’)

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 21: Bigdata : Big picture

DATA ACCESS AND PROCESSING

MAP REDUCE ORNEKOutput Mapper(‘Galatasaray’, 1), (‘sampiyon’, 1), (‘olur’, 1), (‘beşiktas’, 1),(‘sampiyon, 1), (‘olur’, 1), (‘Galatasaray’, 1), (‘Türkiyedir’, 1) (‘fenerbahce’, 1),(‘sampiyon, 1), (‘olur’, 1)

Intermediate Data Reducer’a gönderilen(‘Galatasaray’,[1,1])(‘sampiyon’,[1,1,1])(‘olur’,[1,1,1])(‘beşiktas’,[1])(‘fenerbahce’,[1])(‘Türkiyedir’,[1])

Reducer’ın son cıktısı

(‘sampiyon’,3)(‘olur’,3)(‘Galatasaray’,2)(‘fenerbahce’,1)(‘beşiktas’,1)(‘Türkiyedir’,1)

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 22: Bigdata : Big picture

DATA ACCESS AND PROCESSING

HIVE

▸ SQL to query HDFS by using Hive QL(SQL like language)

▸ Hive transform HiveQL queries into standard Mapreduce jobs

▸ Schema on Read via InputFormat and SerDe▸ Not ideal for ad hoc(slow)▸ Immature optimizer

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 23: Bigdata : Big picture

DATA ACCESS AND PROCESSING

HIVE

▸ Log Processing▸ Text mining▸ Document Indexing▸ Business Analytics▸ Predictive Modeling▸ Not ideal for ad hoc query

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 24: Bigdata : Big picture

DATA ACCESS AND PROCESSING

PIG

▸ Open Source Data flow system▸ simple language for queries and data manipulation,

which is compiled into map-reduce jobs that are run on hadoop

▸ Provides common operations like join,group,sort▸ Works on files in HDFS▸ Ad hoc queries across large data sets.▸ log analysis

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 25: Bigdata : Big picture

DATA ACCESS AND PROCESSING

CLOUDERA IMPALA▸ DATABASE -LIKE SQL layer on top of Hadoop▸ Distributed,massively parallel processing database

engine▸ SQL is the primary development language▸ Open Source,Impala process data in hadoop cluster

WITHOUT using MapReduce▸ Interactive analysis on data stored in HDFS and

Hbase

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 26: Bigdata : Big picture

DATA ACCESS AND PROCESSING

ORACELE XQUERY FOR HADOOP

▸ Is a transform engine for semistructured data that is stored in Apache Hadoop

▸ Transform Xquery language translating them into series of Mapreduce

▸ load data efficiently into Oracle Database by using Oracle Loader for Hadoop

▸ Provides read and write support to Oracle NOSQL DB

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 27: Bigdata : Big picture

DATA ACCESS AND PROCESSING

ORACELE XQUERY FOR HADOOP

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 28: Bigdata : Big picture

DATA ACCESS AND PROCESSING

APACHE SPARK

▸ Open Source parallel data processing▸ Develop Fast▸ Online Streaming▸ Interactive analytics▸ Machine Learning▸ Speed

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 29: Bigdata : Big picture

DATA ACCESS AND PROCESSING

APACHE SPARK ÖRNEK

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 30: Bigdata : Big picture

DATA UNIFICATION AND ANALYSIS

APACHE SQOOP

▸ Batch Loading▸ Transfer bulk data between structured data stores

and Apache Hadoop▸ Data import and Export between external data stores

and Hadoop▸ Parallelizes data transfer for fast performance

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 31: Bigdata : Big picture

DATA UNIFICATION AND ANALYSIS

ORACLE LOADER FOR HADOOP

▸ Batch Loading▸ High performance loader for fast movement of data

from Hadoop into a table in Oracle Database▸ Loading using online and offline modes▸ offloading expensive data processing from the

database server to hadoop

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 32: Bigdata : Big picture

DATA UNIFICATION AND ANALYSIS

COPY TO BDA▸ Batch Loading

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 33: Bigdata : Big picture

DATA UNIFICATION AND ANALYSIS

ORACLE SQL CONNECTOR FOR HADOOP

▸ Generate external table in database pointing to HDFS data

▸ Load into database or query data in place on HDFS

▸ Fine-grained control over type mapping

▸ Parallel load with automatic load balancing

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 34: Bigdata : Big picture

DATA UNIFICATION AND ANALYSIS

ORACLE TECHNOLOGIES

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 35: Bigdata : Big picture

DATA UNIFICATION AND ANALYSIS

ORACLE ADVANCED ANALYTICS

▸ OAA=Oracle Data Mining+Oracle R enterprise▸ Performance▸ Predictive Analytics▸ Easy

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 36: Bigdata : Big picture

METIN

ORACLE BDA BENEFITS

▸ Ships with leading Hadoop distribution(Cloudera)

▸ Hdfs,hbase,hive,flume,kafka,spark …

▸ Cloudera manager

▸ Ships with great connectivity to Oracle Db

▸ Big Data SQL

▸ Big Data Connectors & ODI

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE

Page 37: Bigdata : Big picture

TEŞEKKÜRLERZEKERIYA BEŞIROĞLUBILGINC IT ACADEMY

TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE