Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction...

25
Tungsten Replicator for Kafka, Elasticsearch, Cassandra

Transcript of Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction...

Page 1: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Tungsten Replicator for Kafka, Elasticsearch, Cassandra

Page 2: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Topics

In todays session• Replicator Basics• Filtering and Glue• Kafka and Options• Elasticsearch and Options• Cassandra• Future Direction

2

Page 3: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Asynchronous replication decouples transaction processing on master and slave DBMS nodes

DBMS Logs

Download transactions via network

Apply using JDBC

THL = Events + Metadata

MySQL/Oracle

DBMS-specific Logging (i.e. Redo or Binary)

Option 1: Local InstallExtractor reads directly from the logs, even when the DBMS service is down. This is the default.

Option 2: RemoteExtractor gets log data via MySQL Replication Slave protocols (which requires the DBMS service to be online) or the Redo Reader feature. This is how we handle RDS and Oracle extraction tasks.

Extractor Options

Master Replicator: Extractor

THL

2 1Slave Replicator:

Applier

THL

MySQL/Oracle

39

Page 4: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Parallel apply maximizes DBMS I/O bandwidth when updating replicas

Master replicator

THL

Parallel Queue

(Events+ metadata)

Slave

Extract Filter Apply Extract Filter Apply Extract Filter Apply

Extract Filter Apply

Extract Filter Apply

StageStage Stage

Slave Replicator Pipeline

remote-to-thl thl-to-q q-to-dbms

30

Page 5: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Why Kafka

• Kafka is a high performance message bus• NOT a database• Great for distributing messages and firing/triggering operations on content• Log aggregation• Activity/security tracking• Metrics• Auditing• Data ingestion for Hadoop

Page 6: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Mass Data Collection with Kafka

Kafka

Kafka

Kafka

Tungsten Replicator

Page 7: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Multiple Target Distribution

Kafka

Kafka

Kafka

Tungsten Replicator

Database

Database

Database

Image Process

Email

Metrics

Page 8: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

How Kafka Replication Works

Download transactions via network

Kafka Applier(Native)

THL = Events + Metadata

Master Replicator: Extractor

THL

Slave Replicator: Applier

THL

Zookeeper

DBMS Logs

MySQL/Oracle

DBMS-specific Logging (i.e. Redo or Binary)

Page 9: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

What Tungsten Replicator Does to Apply into Kafka

• Takes an incoming row and converts it to a message• Message consists of metadata:

– Schema name, table name– Sequence number– Commit timestamp– Operation Type

• Embedded Message Content

Page 10: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Message Structure

SchemaTable

RowRowRowRowRow

Topic: Schema_Table

Row

MsgID: Schema Table PKey

Row

MsgID: Schema Table PKey

Row

Page 11: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Sample Message

{"_meta_committime" : "2017-05-27 14:27:18.0","_meta_source_schema" : "sbtest","_meta_seqno" : "10130","_meta_source_table" : "sbtest","_meta_optype" : "INSERT","record" : {

"c" : "Base Msg","k" : "100","id" : "255759","pad" : "Some other submsg"

}}

Page 12: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Customizable Elements

• Whether acknowledgements are required from Kafka• How much distribution/replication is required before sending the message• Format of the message key• Whether to embed schema and table name• Whether the commit timestamp should be embedded

Page 13: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Demo

Page 14: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Elasticsearch

• Immediately replicate data into Elasticsearch for searching

• Contains the core text and content of the records

• Provides the original information to track back to the record

• Content structure against the schema (index type) and tablename (index)

• Document ID based on the pkey and other information which is configurable

Page 15: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

How Elasticsearch Replication Works

DBMS Logs

Download transactions via network

Elasticsearch Applier(REST API)

THL = Events + Metadata

Redo Logging

Master Replicator: Extractor

THL

Slave Replicator: Applier

THL

Redo ReaderGeneratedPLOG

Page 16: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Sample Entry

{ "_id" : "99999", "_type" : "mg", "found" : true, "_version" : 2, "_index" : "msg", "_source" : { "msg" : "Hello ElasticSearch", "id" : "99999" } }

Page 17: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Replicating into CassandraReplicating into Cassandra

Page 18: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Demo

Page 19: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Cassandra

• Great for fast online and CRM style deployments

• Highly fault tolerant and scalable

• Has some data and formatting changes– Currently needs our DDL translation tool (soon built-in)

• Quasi table/doccument style

Page 20: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

How Cassandra Replication Works

base

Master Replicator

Slave Replicator

CSV

Ruby Connector

staging

base

merge

JS

46

Cassandra

Page 21: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Demo

Page 22: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Future Direction for these appliers and related technology

• Full transaction support for Kafka• Support for Amazon Elasticsearch• Kafka Extraction

– Parsing contents of Kafka message queues– Database updates– Large scale distribution of database changes– Filtering and re-submission

Page 23: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

General Tungsten Replicator Functionality

• Expanding the standard filter technology– Data translation (dates, numbers, hex)– Basic lookup/combination to aid ETL style deployments– Data munging/obfuscation (PII, credit cards) for analytics

• More appliers– InfluxDB– SQL Server– PostgreSQL– Hadoop JDBC– MemSQL– Amazon (Aurora, Elasticsearch)– CouchDB/Base

• THL Compression/Encryption

Page 24: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

Next Steps

• If you are interested in knowing more about Tungsten Replicator and would like to try it out for yourself, please contact our sales team who will be able to take you through the details and setup a POC – [email protected]

• Read the documentation at http://docs.continuent.com/tungsten-replicator-5.2/index.html

• Subscribe to our Tungsten University YouTube channel! http://tinyurl.com/TungstenUni

14

Page 25: Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction support for Kafka •Support for Amazon Elasticsearch •Kafka Extraction –Parsing

For more information, contact us:

MC BrownVP [email protected]