Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

25
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors Michael Rainey Principal Consultant - Oracle ACE Rittman Mead February 11, 2015 Please Stand By. This session will begin p indicated on the agenda. Thank You.

Transcript of Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Page 1: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsMichael RaineyPrincipal Consultant - Oracle ACERittman MeadFebruary 11, 2015

Please Stand By. This session will begin promptly at the time indicated on the agenda. Thank You.

Page 2: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Oracle Confidential – Internal/Restricted/Highly Restricted2

Page 3: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Introduction•Michael Rainey – Principal Consultant – Rittman Mead‣Oracle Data Integration expert and Oracle ACE‣GoldenGate and Oracle Data Integrator

•Rittman Mead ‣Provide consulting, training, and managedservices worldwide

‣Focus on business intelligence, data integration, and advanced analytics

‣Rittman Mead India recently named Oracle Analytics Partner of the Year

@mRainey

Page 4: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Agenda•Why Oracle Data Integration for Big Data?•Review the technologies‣Oracle GoldenGate 12c and GoldenGate 12c Adapters‣Hadoop, HDFS, and Hive‣Sqoop and Flume

•Big Data Lite VM introduction•Demonstrations‣Initial load from MySQL to Hadoop‣Real-time replication using GoldenGate 12c direct to Hadoop

‣Real-time replication using GoldenGate 12c and Flume

Page 5: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Oracle Data Integration 12c

•High performance data integration•Real-time data replication•End-to-end integrated with simplified deployment

•Unified tooling for both structured data sources and Hadoop / NoSQL

•Flexible deployment on-premise or in the cloud for heterogeneous systems

•Data governance and full metadata management

Real-time data integration, data management for Cloud and Big Data

Big Data

Cloud

Apps

Database

Oracle Data IntegratorOracle Data Integrator

Oracle GoldenGateOracle GoldenGate

Oracle Enterprise Data Oracle Enterprise Data QualityQuality

Oracle Data Services Oracle Data Services IntegratorIntegrator

Oracle Enterprise Oracle Enterprise Metadata ManagementMetadata Management

Page 6: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Oracle Data Integration with Big Data•Big Data Adapters‣Natively connect to Hadoop‣Produce native code to executeon big data source or target

•Utilize Oracle Data Integrationcapabilities ‣“Design once, run anywhere” ‣High performance replication, heterogeneous sources/targetsdata quality checks, etc.

‣Easy to extend and customize

Page 7: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Oracle GoldenGate 12c

Page 8: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Hadoop, HDFS and Hive•Hadoop is the framework for storing large amounts of data and processing it in an efficient and faster manner‣Storage: Hadoop Distributed File System (HDFS)‣Processing: MapReduce

•HDFS – stores data as large files across multiple systems, using redundancy for reliability‣Master/slave architecture – namenodes and datanodes‣Data replicated to multiple datanodes, with namenode tracking location

•Hive – data warehouse infrastructure for analysis of large datasets in HDFS‣HiveQL – SQL-like language for querying data‣Transparently converts queries to MapReduce

Page 9: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Loading Relational Data with Sqoop•Sqoop - short for “SQL to Hadoop”‣Import whole tables, or whole schemas, from relational databases into HDFS‣Export data from HDFS back out to these databases – with the export and import being performed through MapReduce jobs

‣Import using a SQL SELECT statement, rather than grabbing whole tables‣Incremental loads, specifying a key column to determine what to exclude‣Load directly into Hive tables, creating HDFS files in the background and the Hive metadata automatically

•Sqoop User Guide: http://archive.cloudera.com/cdh4/cdh/4/sqoop/SqoopUserGuide.html

Page 10: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Moving Data with Flume•Reliable, distributed approach to collecting, aggregating, & moving large amounts of log data• Installed as Java agents that run on source and target‣Source – listens for and consumes incoming events (transactions)‣Channel – where events are queued and staged‣Sink – processes that write transactions to disk

•Transactions can be distributed to multiple targets or fed from many sources to one target

Page 11: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Oracle GoldenGate 12c Application Adapters

•GoldenGate Hive Adapter‣Integrating OGG Adapter with Hive (Doc ID 1586188.1)

•GoldenGate Flume Adapter‣Integrating OGG Adapter with Flume (Doc ID 1926867.1)

Page 12: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Big Data Lite VM and Hands-On Labs•Big Data Lite 4.0 VM‣Fully integrated Oracle Big Data environment‣Similar to Big Data Appliance setup‣Technologies include ODI, GoldenGate, Big DataConnectors, Hadoop, Flume, Sqoop, Hive, etc.

•Hands-on labs provide step-by-step instructions‣Demo environment built-in to Big Data Lite VM

•Big Data Lite VM:http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

•Hands-on Labs: http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_12c/DI_BDL_Guide/BigDataIntegration_Demo.html

Page 13: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Getting Started with the Big Data Lite VM•Scenario: Oracle MoviePlex is an online movie streaming company with web logs and MySQL database sources

•Goal: Move data into Hadoop, perform integration, then distribute to Oracle Exadata data warehouse for further processing

Page 14: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Tame Big Data with Oracle Data Integration: Part 1•Real-time data replication from MySQL relational database to Hadoop‣Initial load using Sqoop and Oracle DataIntegrator 12c

‣Change Data Capture using GoldenGate 12c and GoldenGate Adapters (Hive and Flume)

Page 15: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Initial Load using Sqoop and ODI 12c•Mapping between source MySQL table “MOVIE” and target Hive table “movie”‣Data server connections setup for each technology‣Tables reverse engineered as Datastores in ODI Models

•Mapping uses “IKM SQL to Hive-HBase-File (SQOOP)” Knowledge Module to load the Hive table via Sqoop‣Creates Sqoop option file and launches Sqoop import‣Optional loads into Hive staging table‣Loads rows into target table using HiveQL insert overwrite

Page 16: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

DemonstrationInitial Load via Sqoop

Page 17: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Real-time Load using Oracle GoldenGate 12c to Hive•Adapter developed using Oracle GoldenGate’s Java API and Hadoop HDFS Java API‣Writes trail data to target Hive tables‣Example “SampleHandler.java” found on My Oracle Support

•Properties file must be in same directory as parameter files‣Contains information about necessary JAR files, target Hive tables, logging parameters, etc

•Another example:‣http://www.rittmanmead.com/2014/09/using-oracle-goldengate-for-trickle-feeding-rdbms-transactions-into-hive-and-hdfs/

Page 18: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

DemonstrationLoad MySQL to HDFS using GoldenGate 12c

Page 19: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Another Real-time Load Option – using GoldenGate and Flume•Adapter developed using Oracle GoldenGate’s Java API and Apache Flume Client API‣Transactions delivered to RpcClient API‣Example “SampleHandlerFlume.java” found on My Oracle Support

•Flume configuration file (flume.conf)‣Flume agent listens for RPC calls on the source, stages transactions in memory, and delivers data to HDFS

‣Flume agent must be started•Properties file must be in same directory as parameter files

Page 20: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

DemonstrationLoad MySQL to HDFS using GoldenGate 12c and Flume

Page 21: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : [email protected] : www.rittmanmead.com

Summary•Oracle Data Integration and Big Data are a great match‣GoldenGate 12c Adapters‣ODI 12c Big Data Connectors‣“Design once, run anywhere” fits for Big Data sources and targets

•Big Data Lite VM and Hands On labs‣Get started with Oracle Data Integration and Big Data‣Learn by example

•GoldenGate 12c can load Hadoop effectively‣Java API examples on MOS are great for getting started

•Further information:‣http://www.rittmanmead.com/blog‣https://blogs.oracle.com/dataintegration

Page 22: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Keep Learning with Training from Oracle University

Oracle Confidential – Internal/Restricted/Highly Restricted22

• Hands-on training delivered in-class or online by tenured instructors around the world

• New subscription-based learning services to give you any-time access to training

• Certification programs to validate your skills

education.oracle.com

Page 23: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal/Restricted/Highly Restricted23

Page 24: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal/Restricted/Highly Restricted24

Page 25: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Oracle Confidential – Internal/Restricted/Highly Restricted25