Through O Shaped Glasses - Amazon S3 · Through O Shaped Glasses Introducing Kafka to the Oracle...

37
© 2018 Dbvisit Software | dbvisit.com © 2018 Dbvisit Software | dbvisit.com Through O Shaped Glasses Introducing Kafka to the Oracle DBA Mike Donovan CTO Dbvisit Software

Transcript of Through O Shaped Glasses - Amazon S3 · Through O Shaped Glasses Introducing Kafka to the Oracle...

© 2018 Dbvisit Software | dbvisit.com© 2018 Dbvisit Software | dbvisit.com

Through O Shaped Glasses

Introducing Kafka to the Oracle DBA

Mike Donovan

CTO

Dbvisit Software

© 2018 Dbvisit Software | dbvisit.com

2

Mike Donovan

Chief Technology Officer, Dbvisit Software

• Multi-platform DBA, (Oracle, MSSQL…..)

• Conference speaker: OOW, RMOUG, dbTech Showcase, Collaborate

• NZOUG member

• Technical Writer and Editor

• Kafka enthusiast

• Say that I am an oracle ACE ☺

Professional not-knower of things

© 2018 Dbvisit Software | dbvisit.com

Why am I interested in Kafka?

© 2018 Dbvisit Software | dbvisit.com

BEFORE: Many Ad Hoc Pipelines

© 2018 Dbvisit Software | dbvisit.com

Stream Data Platform with Kafka

• Distributed

• Fault Tolerant

• Stream Processing

• Data Integration

• Message Store

© 2018 Dbvisit Software | dbvisit.com

Agenda

• What is Kafka

• Looking at this new technology as an Oracle

DBA

• Why should an Oracle professional care?

• How do I get started with Kafka

© 2018 Dbvisit Software | dbvisit.com

What’s all the fuss about?

© 2018 Dbvisit Software | dbvisit.com

The New World of data

• Data centralization

• Real time delivery

• Integration

• Stream data processing

• New data end points/stores

© 2018 Dbvisit Software | dbvisit.com

What is Kafka?

A scalable, fault tolerant, distributed system where messages are kept in

topics that are partitioned and replicated across multiple nodes.

• Developed at LinkedIn ~2010

• Confluent and the OS project

An open-source publish-subscribe messaging implemented as a

distributed commit log

© 2018 Dbvisit Software | dbvisit.com

What is Kafka?

• Data is written to Kafka in the form of key-value pair messages (can have

null)

• Each message belongs to a topic

• Messages as a continuous flow (stream) of events

• Producers (writers) decoupled from Consumers (readers)

• A delivery channel/platform (if you like) – crossing systems (data

Integration)

• TOPICS (Kafka) (~)= TABLES (ORACLE)

© 2018 Dbvisit Software | dbvisit.com

Kafka - components

Zookeeper

Schema Registry

Kafka

REST Proxy

Kafka Connect

data

What about KSQL and Kafka

Streams?

© 2018 Dbvisit Software | dbvisit.com

Kafka – basic operations demo

1. Download the Confluent platform

2. Run the CLI (scripts alternative)

CLI = SQL Plus? (or svrmgr)

3. Push data into Kafka topic

(bundled Producer)

4. Read some data out of a Kafka topic

(bundled Consumer)

© 2018 Dbvisit Software | dbvisit.com

Kafka – why would you use it?3 propositions:

• Messaging system

• Data streaming platform

• Data storage

➢ Messaging

➢ Website Activity

➢ Tracking Metrics

➢ Log Aggregation

➢ Stream Processing

➢ Event Sourcing

➢ Commit Log

© 2018 Dbvisit Software | dbvisit.com

Apples and Oranges? Kafka and Oracle

• Messaging system

- transmission channel

- integration priority

• Data streaming (always on) platform

– in line transformations

- push

• Data storage (topics)

• X

• Data delivery end point

(periodic/batch)

- materialised views?

- logical replication?

- pull?

• Data store (source of truth –

tables)

© 2018 Dbvisit Software | dbvisit.com

© 2018 Dbvisit Software | dbvisit.com

Oracle Database tables: State active record model

Persistent Store - retains current known STATE.

ID Name Salary

1 Chris 100

2 Jim 350

3 Bob 500

Oracle

Database

Source

select * from

employees

where ID = 2;

© 2018 Dbvisit Software | dbvisit.com

ID Name Old

Salary

New

Salary

Machine

IDUser TRANS_

TYPE

Commit

Timestamp

2 Jim 300 350 Machine_2 QA U2016-May-12

14:22:03

ID Name Salary

1 Chris 100

2 Jim 350

3 Bob 500

Update emp set

salary = 350

Where id = 2

Insert into stage_emp values

(2, 300, 350, Machine_2, QA,

U, 2016-May-12 14:22:03)

Event Streaming

Source

Target

INSERT ALL ROWS mode

© 2018 Dbvisit Software | dbvisit.com

Map of Europe

Show SQL Statement

Train No Start

Location

End

Location

Passengers Engineer Status TRANS_

TYPE

Commit

Timestamp

1 London Cardiff 100 Smythe Good I2016-Nov-12

14:22:00

1 Cardiff Cardiff 0 Smythe Good U2016-Nov-12

17:24:03

1 Cardiff Edinburgh 312 Johnson Good U2016-Nov-12

18:00:09

1 Edinburgh Edinburgh 0 Rest U2016-Nov-13

04:02:33

What’s missing with state alone?

© 2018 Dbvisit Software | dbvisit.com

Table/Stream “Duality”

Changes

Time

© 2018 Dbvisit Software | dbvisit.com

Table/Stream “Duality”

• Log Compaction

• KSQL

• demo

© 2018 Dbvisit Software | dbvisit.com

© 2018 Dbvisit Software | dbvisit.com

The Online Redo Log Files

The redo log stores a continuous chain of chronological order of every

change vector applied to the database. This will be the bare minimum of

information required to reconstruct, or redo, all the work that has been

done.

If a datafile (or the whole database) is damaged or destroyed, these change

vectors can be applied to datafile backups to redo the work, bringing them

forward in time until the moment that the damage occurred.

P89 OCA exam guide

© 2018 Dbvisit Software | dbvisit.com

The Redo Log!CHANGE #3 TYP:0 CLS:1 AFN:4 DBA:0x0100061b OBJ:27521 SCN:0x0000.00ab0ab9 SEQ:2

OP:11.2 ENC:0 RBL:0

KTB Redo

op: 0x01 ver: 0x01

compat bit: 4 (post-11) padding: 1

op: F xid: 0x0003.00c.00001047 uba: 0x00c31f4a.043f.15

KDO Op code: IRP row dependencies Disabled

xtype: XA flags: 0x00000000 bdba: 0x0100061b hdba: 0x0100061a

itli: 1 ispac: 0 maxfr: 4858

tabn: 0 slot: 0(0x0) size/delt: 24

fb: --H-FL-- lb: 0x1 cc: 6

null: ------

col 0: [ 2] c1 07

col 1: [ 5] 50 65 72 72 79

col 2: [ 2] c1 16

col 3: [ 2] c1 02

col 4: [ 2] 49 54

col 5: [ 2] c1 07

insert into HR.EMPLOYEES values (6,'Perry',21,1,'IT’,6);

© 2018 Dbvisit Software | dbvisit.com

An old methodology: Event Sourcing

“Event Sourcing ensures that all changes to application state are stored as a sequence of events...

The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an

event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as

the application state itself.”

Martin Fowler: https://www.martinfowler.com/eaaDev/EventSourcing.html

• Don't save the current state of objects

• Instead write the events that lead to the current state

• An APPEND-ONLY log

© 2018 Dbvisit Software | dbvisit.com

An old methodology: Event Sourcing

Martin Kleppmann - Designing Data Intensive Applications

EVENT• Details

• Meta-data

© 2018 Dbvisit Software | dbvisit.com

Event Sourcing Benefits

Fowler suggests:

• Complete Rebuild - rehydrate secondary systems

• Temporal Queries

• Event Replay - forward and reverse

© 2018 Dbvisit Software | dbvisit.com

• Capture all changes in the database and record these as events

Every change becomes an insert, even a delete and update become an insert

• Adds additional information (metadata) about these changes such as who,

where, what, when

• Turning the database “inside out” (turn the redo log into a normal log)

• See the full lifecycle of the data, now possible in real time!

Event Streaming

© 2018 Dbvisit Software | dbvisit.com

The Online Redo Log FilesI created a topic (in Confluent 4.0) called connect-dbmessage

Boils down to a file on disk here (where is this determined?)

/tmp/confluent.Sz1GdA5f/kafka/data/connect-dbmessage-0

We can run a strings command on it:

AND we can also dump it using some

Kafka tools (need to show this)...

© 2018 Dbvisit Software | dbvisit.com

Oracle Change Data – delivered to Kafka

INSERT...

into SCOTT.TEST9

metadata

© 2018 Dbvisit Software | dbvisit.com

Kafka - a log writer/readerPartition 0 Partition 1 Partition 2

Old

New

• Organized by topics

• Sub-categorization by

partitions (log files on

disk)

• Replicated between

nodes for redundancy

© 2018 Dbvisit Software | dbvisit.com

Indexes, Offsets and Data filesKafka - a log writer/reader

[oracle@dbvrep01 REP-TX.META-0]$ ll

total 4172

-rw-r--r-- 1 oracle oinstall 10485760 Jun 15 18:51 00000000000000000000.index

-rw-r--r-- 1 oracle oinstall 4236052 Jun 15 18:56 00000000000000000000.log

-rw-r--r-- 1 oracle oinstall 10485756 Jun 15 18:51 00000000000000000000.timeindex

DUMP LOG SEGMENTS COMMAND:kafka-run-class kafka.tools.DumpLogSegments --print-data-log

--files /tmp/kafka-logs/REP-TX.META-0/00000000000000000000.log

© 2018 Dbvisit Software | dbvisit.com

Topic vs Table Creation

Create a topic

bin/kafka-topics —create —zookeeper localhost:2181 --topic TOPIC_NAME --replication-factor 1 --partitions 1

© 2018 Dbvisit Software | dbvisit.com

Kafka Connect - export/import toolDatapump anyone?

• Cassandra

• Elasticsearch

• Google BigQuery

• Hbase

• HDFS

• JDBC

• Kudu

• MongoDB

• Postgres

• S3

• SAP HANA

• Solr

• Vertica

© 2018 Dbvisit Software | dbvisit.com

SMTs and KStreams

Create a topic

bin/kafka-topics —create —zookeeper localhost:2181 --topic TOPIC_NAME --replication-factor 1 --partitions 1

© 2018 Dbvisit Software | dbvisit.com

• Kafka and Kafka Connectwww.confluent.io

• Download the Confluent Platform (bundled connectors)

• Check out the available community connectors

• Try running it in Docker

Get started with Kafka

© 2018 Dbvisit Software | dbvisit.com

• Real-time Oracle Database Streaming software solutions

• In the Cloud | Hybrid | On-Premise

• New Zealand-based, US office, Asia Sales office, EU office (Prague)

• Unique offering: disaster recovery solutions for Oracle Standard Edition

• Logical replication for moving data where ever and whenever you wish

• Flexible licensing, cost effective pricing models available

• Exceptional growth, 1300+ customers

• Peerless customer support

About Dbvisit Software

© 2018 Dbvisit Software | dbvisit.com© 2018 Dbvisit Software | dbvisit.com

Thank you

@[email protected]