Look Ma! No more blobs
-
Upload
aparna-chaudhary -
Category
Technology
-
view
932 -
download
1
description
Transcript of Look Ma! No more blobs
Look Ma! No more blobs
Aparna Chaudhary
NoSQL matters, @Cologne Germany 2013
EMBRACEPOLYGLOT
PERSISTENCE!
STOP RDBMS ABUSE!
KNOW YOUR USE CASE
Parse
Extract
Store
Read XML
We don't do rocket science...
Use Case
Runtime support for document types
Metadata definition provided at runtime
Document type names - max 50 char
Look up content based on metadata
RA
Challenges
Storage of up to one million documents of 10KB to 2GB per document type per year
Write 1MB < x msec
Retrieve 1MB < y msec
......and detailsRA
But…the Numbers make it interesting...
How?
File System
MongoDB
RDBMS
JCR
Document Management
if you want to store files, its logical to use file system.
ain't it?
File System
✓ Ease of Use
✓ No special skill-set
✓ Backup and Recovery
✓ It’s free!
How do I name them?
Support for metadata storage?
Performance with too many small files?
Query - Administration?
High Availability?
Limitation on total number of
files?
Relational database
IntegrityConsistency
Durability
Atomicity
JoinsBackups
High Availability
You name it, We have it!
RDBMS
Aggregations
RDBMS Developer’s Perspective
Challenge #1
RA
We need runtime support for document type.
RA
We need runtime support for document type.
Challenge #1
DOC_1 DOC_2 DOC_3
DOC_4 DOC_5 DOC_6
Dynamic DDL Generation
DOC_1 DOC_2 DOC_3
DOC_4 DOC_5 DOC_6
Dynamic DDL Generation
Challenge #1String concatenations
are ugly…
DEV
String concatenations are ugly…
DEV
Challenge #1Let's build a utility.
DEV
Let's build a utility.
DEV
Challenge #1
More Work More Work
Challenge #2
RA
Document type is 50 char long
RA
Document type is 50 char long
Challenge #2TABLE NAME LIMITS
Wait…SQL-92 says 128 Char
?We rule. Let's support only
30 char.
TABLE NAME LIMITS
Wait…SQL-92 says 128 Char
?We rule. Let's support only
30 char.
Challenge #2
DOC_TYPE_MAPPING
Let's create a mapping table.
DEV
DOC_TYPE_MAPPING
Let's create a mapping table.
DEV
Challenge #2
Ugly unreadable table names!
Ugly unreadable table names!
So...f inally...Read XML
Dynamic DDL generation
Document Type Alias
DocumentTypeDefined
Yes
No
Extract Metadata
Store Metadata
Store Content
Simple use case becomes complex...
Remember...Our Challenge
QA
Let's see if we are in spec for response time.
Aah..what about performance now?
DEV
MongoDB
Document BasedGridFS
B-TreeDynamic Schema
JSON
BSON Query
Scalablehttp://www.10gen.com/presentations/storage-engine-internals
Joins
Complex Transaction
F1 F2 F3 F4 F5ID1
ID2
ID3
ID4
ID5
F1
F1
F1
F1
F2
F2 F3 F4 F5 F6
F2 F3 F4 F5 Fx
F8
F3
F9 F7
Concepts
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Table = Collection
Column = Field
Row = Document
Database = Database
GridFS
MongoDB divides the
large content into
chunks
Stores Metadata and Chunks separately
http://docs.mongodb.org/manual/core/gridfs/
> mybucket.files{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),
"chunkSize" : NumberLong(262144),
"length" : NumberLong(103015),
"md5" : "34d29a163276accc7304bd69c5520e55",
"filename" : "health_record_2.xml",
"contentType" : application/xml,
"uploadDate" : ISODate("2013-03-23T07:41:44.907Z"),
"aliases" : null,
"metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" : "Netherlands" }
}
ObjectId - 12 Byte BSON:4 Byte - Seconds since Epoch3 Byte - Machine Id2 Byte - Process Id3 Byte - Counter
> mybucket.chunks
{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5d"), "files_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),
"n" : 0,
"data" : BinData(0,...)
}
?I'm storing 10KB file, but
would it use 256KB on disk?
Last Chunk =
FileSize % 256+
Metadata overhead
256
1128KB
256 256 256 104 + x
10KB
10 + x
Chunk is as big as it
needs to be...
Challenge #1
DEV
MongoDB supports Dynamic Schema.
You can use collection per docType and they are created dynamically.
RA
We need runtime support for document type.
Challenge #2
RA
Document type is 50 char long
DEV
MongoDB namespace can be up to 123 char.
So...f inally...
Simple use case remains simple...well becomes
simpler...
Read XML
Extract Metadata
Store Metadata & Content
Remember...Our Challenge
QA
Let's see if we are in spec for response time.
DEV
Performance test is part of our definition of 'DONE'
BEcause seeing is believing!
Demo
‣ GridFS 2.4.0
‣ PostgreSQL 9.2
‣ Spring Data
‣ JMeter 2.7
‣ Mac OS X 10.8.3 2.3GHz Quad-Core Intel Core i7, 16GB RAM
https://github.com/aparnachaudhary/nosql-matters-demo
EMBRACEPOLYGLOT
PERSISTENCE!
STOP RDBMS ABUSE!
KNOW YOUR USE CASE
@aparnachaudhary
Java Developer, Data Lover
Eindhoven, Netherlands
http://blog.aparnachaudhary.com/
@aparnachaudhary
Thank You!