Real-Time Integration Between MongoDB and SQL Databases
-
Upload
mongodb -
Category
Technology
-
view
5.826 -
download
0
description
Transcript of Real-Time Integration Between MongoDB and SQL Databases
![Page 1: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/1.jpg)
1
Distributed, fault-tolerant, transactional
Real-Time Integration: MongoDB and SQL Databases
Eugene DvorkinArchitect, WebMD
![Page 2: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/2.jpg)
2
WebMD: A lot of data; a lot of traffic
~900 millions page view a month~100 million unique visitors a month
![Page 3: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/3.jpg)
3
How We Use MongoDB
User Activity
![Page 4: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/4.jpg)
4
Why Move Data to RDBMS?
Preserve existing investment in BI and data warehouse
To use analytical database such as VerticaTo use SQL
![Page 5: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/5.jpg)
5
Why Move Data In Real-time?
Batch process is slow
No ad-hoc queries
No real-time reports
![Page 6: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/6.jpg)
6
Challenge in moving data
Transform Document to Relational Structure Insert into RDBMS at high rate
![Page 7: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/7.jpg)
7
Challenge in moving data
Scale easily as data volume and velocity increase
![Page 8: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/8.jpg)
8
Our Solution to move data in Real-time: Storm
tem. Storm – open source distributed real-time computation system.
Developed by Nathan Marz - acquired by Twitter
![Page 9: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/9.jpg)
9
Hadoop Storm
Our Solution to move data in Real-time: Storm
![Page 10: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/10.jpg)
10
Why STORM?
JVM-based framework
Guaranteed data processing
Supports development in multiple
languages
Scalable and transactional
![Page 11: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/11.jpg)
11
Overview of Storm cluster
Master Node
Cluster Coordination
run worker processes
![Page 12: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/12.jpg)
12
Storm Abstractions
Tuples, Streams, Spouts, Bolts and Topologies
![Page 14: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/14.jpg)
14
Stream
Unbounded sequence of tuples
Example: Stream of messages from message queue
![Page 15: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/15.jpg)
15
Spout
Read from stream of data – Queues, web logs, API calls, mongoDB oplogEmit documents as tuples
Source of Streams
![Page 16: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/16.jpg)
16
BoltsProcess tuples and create new streams
![Page 17: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/17.jpg)
17
Bolts
Apply functions /transformsCalculate and aggregate data (word count!)Access DB, API , etc.Filter dataMap/Reduce
Process tuples and create new streams
![Page 18: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/18.jpg)
18
Topology
![Page 19: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/19.jpg)
19
Topology
Storm is transforming and moving data
![Page 20: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/20.jpg)
20
MongoDB
How To Read All Incoming Data from MongoDB?
![Page 21: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/21.jpg)
21
MongoDB
How To Read All Incoming Data from MongoDB?
Use MongoDB OpLog
![Page 22: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/22.jpg)
22
What is OpLog?
Replication mechanism in MongoDBIt is a Capped Collection
![Page 23: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/23.jpg)
23
Spout: reading from OpLog
Located at local database, oplog.rs collection
![Page 24: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/24.jpg)
24
Spout: reading from OpLog
Operations: Insert, Update, Delete
![Page 25: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/25.jpg)
25
Spout: reading from OpLog
Name space: Table – Collection name
![Page 26: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/26.jpg)
26
Spout: reading from OpLog
Data object:
![Page 27: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/27.jpg)
27
Sharded cluster
![Page 28: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/28.jpg)
28
Automatic discovery of sharded cluster
![Page 29: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/29.jpg)
29
Example: Shard vs Replica set discovery
![Page 30: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/30.jpg)
30
Example: Shard discovery
![Page 31: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/31.jpg)
31
Spout: Reading data from OpLog
How to Read data continuously from OpLog?
![Page 32: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/32.jpg)
32
Spout: Reading data from OpLog
How to Read data continuously from OpLog?
Use Tailable Cursor
![Page 33: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/33.jpg)
33
Example: Tailable cursor - like tail –f
![Page 34: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/34.jpg)
34
Manage timestamps
Use ts (timestamp in oplog entry) field to track processed records
If system restart, start from recorded ts
![Page 35: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/35.jpg)
35
Spout: reading from OpLog
![Page 36: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/36.jpg)
36
SPOUT – Code Example
![Page 37: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/37.jpg)
37
TOPOLOGY
![Page 38: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/38.jpg)
38
Working With Embedded Arrays
Array represents One-to-Many relationship in RDBMS
![Page 39: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/39.jpg)
39
Example: Working with embedded arrays
![Page 40: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/40.jpg)
40
Example: Working with embedded arrays
{_id: 1, ns: “person_awards”, o: { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' }}
{ _id: 1, ns: “person_awards”,o: {award: 'Turing Award', year: 1977, by: 'ACM' }}
![Page 41: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/41.jpg)
41
Example: Working with embedded arrays
public void execute(Tuple tuple) {
.........
if (field instanceof BasicDBList) {
BasicDBObject arrayElement=processArray(field)
......
outputCollector.emit("documents", tuple, arrayElement);
![Page 42: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/42.jpg)
42
Parse documents with Bolt
![Page 43: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/43.jpg)
43
{"ns": "people", "op":"i", o : { _id: 1, name: { first: 'John', last: 'Backus' }, birth: 'Dec 03, 1924’}
["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "DEc 03, 1924" ]]
Parse documents with Bolt
![Page 44: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/44.jpg)
44
@Override
public void execute(Tuple tuple) {
......
final BasicDBObject oplogObject =
(BasicDBObject)tuple.getValueByField("document");
final BasicDBObject document =
(BasicDBObject)oplogObject.get("o");
......
outputValues.add(flattenDocument(document));
outputCollector.emit(tuple,outputValues);
Parse documents with Bolt
![Page 45: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/45.jpg)
45
Write to SQL with SQLWriter Bolt
![Page 46: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/46.jpg)
46
Write to SQL with SQLWriter Bolt
["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "Dec 03, 1924" ]
]insert into people (_id,name_first,name_last,birth) values
(1,'John','Backus','Dec 03,1924') ,
insert into people_awards
(_id,awards_award,awards_award,awards_by) values (1,'Turing
Award',1977,'ACM'),
insert into people_awards
(_id,awards_award,awards_award,awards_by) values (1,'National
Medal of Science',1975,'National Science Foundation')
![Page 47: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/47.jpg)
47
@Override public void prepare(.....) {.... Class.forName("com.vertica.jdbc.Driver"); con = DriverManager.getConnection(dBUrl, username,password);
@Override public void execute(Tuple tuple) { String insertStatement=createInsertStatement(tuple); try { Statement stmt = con.createStatement(); stmt.execute(insertStatement); stmt.close();
Write to SQL with SQLWriter Bolt
![Page 48: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/48.jpg)
48
Topology Definition
TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)
LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());
![Page 49: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/49.jpg)
49
Topology Definition
TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)
LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());
![Page 50: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/50.jpg)
50
Topology Definition
TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)
LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());
![Page 51: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/51.jpg)
51
Topology Definition
TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)
StormSubmitter.submitTopology("OfflineEventProcess", conf,builder.createTopology())
![Page 52: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/52.jpg)
52
Lesson learned
By leveraging MongoDB Oplog or
other capped collection, tailable cursor
and Storm framework, you can build
fast, scalable, real-time data
processing pipeline.
![Page 53: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/53.jpg)
53
Resources
Book: Getting started with StormStorm Project wikiStorm starter projectStorm contributions projectRunning a Multi-Node Storm cluster tutorialImplementing real-time trending topicA Hadoop Alternative: Building a real-time data pipeline with StormStorm Use cases
![Page 54: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/54.jpg)
54
Resources (cont’d)
Understanding the Parallelism of a Storm TopologyTrident – high level Storm abstraction A practical Storm’s Trident API Storm online forumMongo connector from 10gen Labs MoSQL streaming Translator in RubyProject source codeNew York City Storm Meetup
![Page 55: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/55.jpg)
55
Questions
Eugene Dvorkin, Architect, WebMD [email protected] Twitter: @edvorkin LinkedIn: eugenedvorkin
![Page 56: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/56.jpg)
56
![Page 57: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/57.jpg)
57
![Page 58: Real-Time Integration Between MongoDB and SQL Databases](https://reader035.fdocuments.net/reader035/viewer/2022062617/54b769594a7959a23c8b487d/html5/thumbnails/58.jpg)
58
Next Sessions at 2:505th Floor:
West Side Ballroom 3&4: Data Modeling Examples from the Real World
West Side Ballroom 1&2: Growing Up MongoDB
Juilliard Complex: Business Track: MetLife Leapfrogs Insurance Industry with MongoDB-Powered Big Data Application
Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session
7th Floor:
Empire Complex: How We Fixed Our MongoDB Problems
SoHo Complex: High Performance, High Scale MongoDB on AWS: A Hands On Guide