Polyglot Persistence
-
Upload
scott-leberknight -
Category
Technology
-
view
1.076 -
download
2
Transcript of Polyglot Persistence
![Page 1: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/1.jpg)
Polyglot PersistenceScott Leberknight
![Page 2: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/2.jpg)
Polyglot?
![Page 3: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/3.jpg)
http://memeagora.blogspot.com/2006/12/polyglot-programming.html
Neal Ford
December 2006
Polyglot Programming
![Page 4: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/4.jpg)
http://www.amazon.com/Paradox-Choice-Why-More-Less/dp/0060005688
![Page 5: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/5.jpg)
First web frameworks...
![Page 6: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/6.jpg)
http://java-source.net/open-source/web-frameworks
![Page 7: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/7.jpg)
non-Java web frameworks too!
![Page 8: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/8.jpg)
...then AJAX and JavaScript
![Page 9: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/9.jpg)
InitialContext ic = new InitialContext();DataSource ds = ic.lookup("java:comp/env/jdbc/coffeeDB");Connection con = null;Statement stmt = null;ResultSet rs = null;try { con = ds.getConnection(); stmt = con.createStatement(); rs = stmt.executeQuery("select name, price from coffees"); List<Coffee> coffees = new ArrayList<Cofee>(); while (rs.next()) { String name = rs.getString("name"); float price = rs.getFloat("price"); coffees.add(new Coffee(name, price); }} catch (SQLException sqlex) { log.error("Error getting coffees", sqlex);
...and nowPERSISTENCE
![Page 10: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/10.jpg)
Why?Scalability
(on massive scales)High availability
New types of apps, e.g. social networking
Fault tolerance Distributability
Flexibility(i.e. "schemaless")
![Page 11: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/11.jpg)
Why?
One size does not fit all
![Page 12: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/12.jpg)
Relational
DocumentOriented
Object
Bigtable-ish
A few types of Databases...
Key-value
EAV(Entity-Attribute-Value)
![Page 13: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/13.jpg)
Structured
Semi-Structured
UnstructuredTypes of data
![Page 14: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/14.jpg)
ACID vs. BASE
![Page 15: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/15.jpg)
ACID
Atomic
Consistent
Isolated
Durable
![Page 16: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/16.jpg)
ACID in Action
1st Bank
checking savings
customers
Transfer $1000 from
1st Bankchecking to
savings
![Page 17: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/17.jpg)
BASE
Basically Available
Soft State
Eventually Consistent
![Page 18: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/18.jpg)
BASE in Action
1st Bank
checking savings
customers
Transfer $1000 from 1st Bank checking to Bank of Foo savings
Bank of Foo
account account_type
customer
![Page 19: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/19.jpg)
Schedule, Cost, Quality(choose any 2)
![Page 20: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/20.jpg)
Brewer's Conjecture
![Page 21: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/21.jpg)
"When designing distributed web services, there
are three properties that are commonly desired:
consistency, availability, and partition tolerance.
It is impossible to achieve all three."
- "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services"
Seth Gilbert and Nancy Lynch (MIT)
![Page 22: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/22.jpg)
Consistency
Partition-tolerance
Availability
(choose any 2)
![Page 23: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/23.jpg)
We're living in interesting times...
Explosion of alternative persistence choices
Completely new philosophies on persistence
![Page 24: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/24.jpg)
Whirlwind tour...
Relational
Document-Oriented
Key/Value
Bigtable
![Page 25: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/25.jpg)
Ankle-deep
![Page 26: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/26.jpg)
Relational
Databasesblog blog_entry blog_entry_comment
category
daily_statistics
blog_owner
blog_user
![Page 27: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/27.jpg)
Relations(tables, joins, integrity)
ACID guarantees
Query using SQL Strict schema
Difficult to scale, partition
(e.g. 2-phase commit)
By far most popular persistence choice today
Mismatch withOO languages
![Page 28: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/28.jpg)
select *from fakenames fwhere f.surname like 'Smi%' and f.city = 'Richmond' and f.state = 'VA'order by f.surname, f.given_name;
28
![Page 29: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/29.jpg)
Scaling...
Buy a bigger machine(vertical scaling)
![Page 30: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/30.jpg)
What if there is no bigger machine?
Horizontal scaling:
Functional
Sharding
![Page 31: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/31.jpg)
Users 0
Users 1
Products 0 Orders 0
Orders 1
Orders 2
FunctionalShards
![Page 32: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/32.jpg)
Document-Oriented
Databases
![Page 33: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/33.jpg)
"As opposed to Relational Databases, document-based
databases do not store data in tables with uniform sized
fields for each record. Instead, each record is stored as a
document that has certain characteristics. Any number of
fields of any length can be added to a document. Fields can
also contain multiple pieces of data."
- Wikipedia(http://en.wikipedia.org/wiki/Document-oriented_database)
![Page 34: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/34.jpg)
Examples:Lotus Notes
Apache CouchDB
Amazon SimpleDB(for our purposes anyway)
MongoDB
![Page 35: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/35.jpg)
CouchDB
![Page 36: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/36.jpg)
![Page 37: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/37.jpg)
Architecture
![Page 38: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/38.jpg)
Concepts:
Documents
Views
Schemaless
Distributed
![Page 39: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/39.jpg)
RESTful...
![Page 40: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/40.jpg)
![Page 41: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/41.jpg)
Views
JavaScript as description language
Map/Reduce functions
Add structure to semi-structured data
Independent of actual documents(created in special Design Documents)
![Page 42: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/42.jpg)
function(doc) { emit(null, doc);}
42
Simplest map function...
![Page 43: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/43.jpg)
// Map function to find Seattlitesfunction(doc) { if (doc.State == "WA" && doc.City == "Seattle") { emit(doc.Number, { "GivenName":doc.GivenName, "Surname":doc.Surname }); }}
43
![Page 44: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/44.jpg)
// Map functionfunction(doc) { emit(doc.State, 1);}
// Reduce function; aggregates countsfunction (key, values) { return sum(values);}
44
Counting people by state...
![Page 45: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/45.jpg)
Views are not meant to be created dynamically like SQL queries!
Caution:
To keep view querying fast, the view engine maintains indexes of its views, and incrementally updates them to reflect changes in the database. CouchDB’s core design is largely optimized around the need for efficient, incremental creation of views and their indexes.
- http://couchdb.apache.org/docs/overview.html
![Page 46: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/46.jpg)
Amazon SimpleDB
![Page 47: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/47.jpg)
"Amazon SimpleDB is a web service for running queries on
structured data in real time. This service works in close
conjunction with Amazon Simple Storage Service (Amazon S3)
and Amazon Elastic Compute Cloud (Amazon EC2), collectively
providing the ability to store, process and query data sets in
the cloud. These services are designed to make web-scale
computing easier and more cost-effective for developers."
- SimpleDB Developer Guide(Version 2007-11-07)
![Page 48: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/48.jpg)
"A traditional, clustered relational database requires a sizable
upfront capital outlay, is complex to design, and often requires a
DBA to maintain and administer. Amazon SimpleDB is
dramatically simpler, requiring no schema, automatically
indexing your data and providing a simple API for storage
and access. This approach eliminates the administrative
burden of data modeling, index maintenance, and performance
tuning. Developers gain access to this functionality within
Amazon’s proven computing environment, are able to scale
instantly, and pay only for what they use."
- SimpleDB Developer Guide(Version 2007-11-07)
![Page 49: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/49.jpg)
Organize data into domains
Domains have items
Items have attributes
Attributes have value(s)
![Page 50: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/50.jpg)
Domain: Fakenames
"5"
"6/6/1941"
"Gwendolyn"
EmailAddress
"Michael"
"1"
"9/5/1982"
"Chris"
"David"
"11/18/1963""3"
"Swinton"
ID
"Vera"
"Johnson"
Birthday
"4"
GivenName
"9/20/1951""[email protected]"
"Lewis"
"2"
"Sutton"
"7/14/1952"
Surname
"Schuler"
Items
Attributes
Values
![Page 51: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/51.jpg)
Domain: Amazon
"Full Screen"
"Mens"
"Entertainment"
Color Size Length
"DVDs"
"White""Yellow""Beige""Pink"
Format
"Clothes""Blue""Gray""Black"
"Books"
"Sound of Music"
"Item03"
"Blouse"
"Item02"
"Full Screen""Widescreen"
"Entertainment" "174 min"
SubcategoryID Author
"KurtVonnegut "
"Womens"
"Item04"
"Item05"
"Item01" "Pulp Fiction""DVDs"
Name
"Small""Medium""Large"
"Slaugherhouse Five"
Category
"Clothes"
"Entertainment"
"154 min""168 min (special edition)"
"30x30""32x30""34x30"...
"Jeans"
![Page 52: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/52.jpg)
"REST" API
POST / HTTP/1.1Content-Type: application/x-www-form-urlencoded; charset=utf-8User-Agent: Amazon Simple DB Java LibraryHost: sdb.amazonaws.comContent-Length: 232
Action=CreateDomain&DomainName=Fakenames&AWSAccessKeyId=[your AWS access key id]&SignatureVersion=2&SignatureMethod=HmacSHA256&Signature=[computed signature]&Timestamp=2009-03-23T23%3A58%3A55.327Z&Version=2007-11-07
![Page 53: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/53.jpg)
Available APIs:
Java C#
Perl PHP
VB
Ruby gems:aws-simpledb
aws-sdbsimpledb
Amazon
3rd party
Python:polarrose-twisted-amazon
![Page 54: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/54.jpg)
AmazonSimpleDB service = new AmazonSimpleDBClient(accessKeyId, secretAccessKey);
// Create a new domainCreateDomainRequest cdReq = new CreateDomainRequest().withDomainName("Fakenames");CreateDomainResponse cdResp = service.createDomain(cdReq);
// List all our domainsListDomainsRequest ldReq = new ListDomainsRequest();ListDomainsResponse ldResp = service.listDomains(ldReq);
54
![Page 55: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/55.jpg)
Sample response:<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <ListDomainsResult> <DomainName> Fakenames </DomainName> <DomainName> Movies </DomainName> </ListDomainsResult> <ResponseMetadata> <RequestId> 8c4d0240-49ea-5d2f-9573-437324cd144c </RequestId> <BoxUsage> 0.0000071759 </BoxUsage> </ResponseMetadata></ListDomainsResponse>
![Page 56: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/56.jpg)
// Add an attribute valueReplaceableAttribute newEmail = new ReplaceableAttribute("emailAddress", "[email protected]", false);
PutAttributesRequest request = new PutAttributesRequest() .withDomainName("Fakenames") .withItemName("1") .withAttribute(newEmail);
PutAttributesResponse response = service.putAttributes(request);
56
![Page 57: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/57.jpg)
Query API
![Page 58: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/58.jpg)
// Query for RichmondersString query = "['city' = 'Richmond'] intersection ['state' = 'VA']";
QueryRequest request = new QueryRequest() .withDomainName("Fakenames") .withQueryExpression(query);
QueryResponse response = service.query(request);
58
![Page 59: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/59.jpg)
// Query for Richmonders, with attributesString query = "['city' = 'Richmond'] intersection ['state' = 'VA']";
QueryWithAttributesRequest request = new QueryWithAttributesRequest() .withDomainName("Fakenames") .withQueryExpression(query);
QueryWithAttributesResponse response = service.query(request);
59
![Page 60: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/60.jpg)
SELECT API
![Page 61: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/61.jpg)
// Get a countString query = "select count(*) from Fakenames";
SelectRequest request = new SelectRequest().withSelectExpression(query);
SelectResponse response = service.select(request);
61
![Page 62: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/62.jpg)
// Select RichmondersString query = "select * from Fakenames" + " where city = 'Richmond' intersection state = 'VA'" + " intersection surname like 'Smi%'";
SelectRequest request = new SelectRequest().withSelectExpression(query);
SelectResponse response = service.select(request);
62
![Page 63: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/63.jpg)
There are Limits!
Query execution time <= 5 sec
Max items in query response = 250
See SimpleDB Developer Guide for more...
Size limits <= 1024 bytes
Attribute limit per item <= 256
![Page 64: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/64.jpg)
(May I have another?)
<QueryResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <QueryResult> <ItemName> 131 </ItemName> ... <NextToken> rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9rracXLnINNqwMACkkAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhc... </NextToken> </QueryResult> <ResponseMetadata> ... </ResponseMetadata></QueryResponse>
NextToken
![Page 65: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/65.jpg)
Eventually consistent(*)
"Amazon SimpleDB keeps multiple copies of each domain. When data is written or updated...all copies of the data are updated. However, it takes time for the data to propagate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change. Consistency is usually reached within seconds, but a high system load or network partition might increase this time. Performing a read after a short period of time should return the updated data."
(Version 2007-11-07)- SimpleDB Developer Guide
![Page 66: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/66.jpg)
(*) ConsistentRead
Version 2009-04-15 added consistent read option
"If eventually consistent reads are not acceptable for your application, use ConsistentRead. Although this operation might take longer than a standard read, it always returns the last updated value."
(Version 2009-04-15)
- SimpleDB Developer Guide
![Page 67: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/67.jpg)
Distributed Key -
Value Stores
![Page 68: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/68.jpg)
value = store.get(key)
store.put(key, value)
store.remove(key)
68
Basically...
![Page 69: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/69.jpg)
Data stored as key/value pairs "A big hashtable"
Replication Fault tolerance
Data consistency & versioning
Horizontalscaling
![Page 70: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/70.jpg)
Amazon Dynamo(a real-world example)
![Page 71: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/71.jpg)
Distributed key-value storage system
Used by Amazon core and web services(e.g. your Amazon shopping cart...)
Massively scaleable
Fault tolerant Eventually consistent
![Page 72: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/72.jpg)
The-Project-Which-Must-
Not-Be-Named
(Project Voldemort)
![Page 73: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/73.jpg)
What is it?
"a distributed key-value storage system"
automatic replication across multiple servers
transparent server failure handling
automatic data item versioning
![Page 74: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/74.jpg)
"Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table."
http://project-voldemort.com/
![Page 75: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/75.jpg)
designed for horizontal scaling
used at LinkedIn "for certain high-scalability storage problems where simple functional
partitioning is not sufficient"
![Page 76: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/76.jpg)
"Consistent hashing"
No single server holds all data
Data partitioned across multiple servers
Versioning using "vector clocks"
![Page 77: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/77.jpg)
Configuration:
cluster.xml describes cluster (servers, data partitions)
stores.xml describes data stores(persistence, routing, key/value data format, replication factor,
preferred reads/writes, required reads/writes)
![Page 78: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/78.jpg)
<cluster> <name>mycluster</name> <server> <id>0</id> <host>localhost</host> <http-port>8081</http-port> <socket-port>6666</socket-port> <partitions>0, 1, 2, 3</partitions> </server> <server> <id>1</id> <host>localhost</host> <http-port>8082</http-port> <socket-port>6667</socket-port> <partitions>4, 5, 6, 7</partitions> </server></cluster>
78
sample cluster.xml
![Page 79: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/79.jpg)
<stores> <store> <name>people</name> <persistence>bdb</persistence> <routing>client</routing> <replication-factor>3</replication-factor> <preferred-reads>3</preferred-reads> <required-reads>2</required-reads> <preferred-writes>2</preferred-writes> <required-writes>1</required-writes> <key-serializer> <type>json</type> <schema-info>"string"</schema-info> </key-serializer> <value-serializer> <type>json</type> <schema-info>{"GivenName":"string", "Surname":"string"}</schema-info> </value-serializer> </store></stores>
79
sample stores.xml
![Page 80: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/80.jpg)
> locate "1"Node 0host: localhostport: 6666available: yeslast checked: 96171 ms ago
Node 1host: localhostport: 6667available: yeslast checked: 96171 ms ago
Node 2host: localhostport: 6668available: yeslast checked: 96172 ms ago
80
replication
![Page 81: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/81.jpg)
$ ./voldemort-shell.sh people tcp://localhost:6666Established connection to people via tcp://localhost:6666> put "1" { "GivenName":"Bob", "Surname":"Smith" }> get "1"version(0:1): {"GivenName":"Bob", "Surname":"Smith", }> put "1" { "GivenName":"Robert", "Surname":"Smith", }> get "1"version(0:2): {"GivenName":"Robert", "Surname":"Smith", }
81
vector clock
(master node: version)
![Page 82: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/82.jpg)
StoreClientFactory factory = new SocketStoreClientFactory(numThreads, numThreads, maxQueuedRequests, maxConnectionsPerNode, maxTotalConnections, bootstrapUrl);
StoreClient<Integer, Map<String, Object>> client = factory.getStoreClient("fakenames");
// Update a valueVersioned versioned = client.get(1);Map<String, Object> person = versioned.getValue();person.put("EmailAddress", newEmailAddr);versioned.setObject(person);client.put(1, versioned);
82
Java API example
![Page 83: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/83.jpg)
Bigtable
![Page 84: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/84.jpg)
- Bigtable: A Distributed Storage System for Structured Data
http://labs.google.com/papers/bigtable.html
"Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance."
![Page 85: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/85.jpg)
"A Bigtable is a sparse, distributed, persistent
multidimensional sorted map"
- Bigtable: A Distributed Storage System for Structured Data
http://labs.google.com/papers/bigtable.html
![Page 86: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/86.jpg)
?
![Page 87: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/87.jpg)
distributed
sparse
column-oriented
versioned
![Page 88: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/88.jpg)
(row key, column key, timestamp) => value
The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
- Bigtable: A Distributed Storage Systemfor Structured Data
http://labs.google.com/papers/bigtable.html
![Page 89: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/89.jpg)
Key Concepts:
row key => 20090407152657
column family => "name:"
column key => "name:first", "name:last"
timestamp => 1239124584398
![Page 90: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/90.jpg)
Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:"
20090407145045 t7 "info:summary" "An intro to..."20090407145045
t6 "info:author" "John Doe"
20090407145045
t5 "Google's Bigtable is..."
20090407145045
t4 "Google Bigtable is..."
20090407145045
t3 "info:category" "Persistence"
20090407145045
t2 "info:author" "John"
20090407145045
t1 "info:title" "Intro to Bigtable"
20090320162535 t4 "info:category" "Persistence"20090320162535
t3 "CouchDB is..."
20090320162535
t2 "info:author" "Bob Smith"
20090320162535
t1 "info:title" "Doc-oriented..."
![Page 91: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/91.jpg)
Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:"
20090407145045 t7 "info:summary" "An intro to..."20090407145045
t6 "info:author" "John Doe"
20090407145045
t5 "Google's Bigtable is..."
20090407145045
t4 "Google Bigtable is..."
20090407145045
t3 "info:category" "Persistence"
20090407145045
t2 "info:author" "John"
20090407145045
t1 "info:title" "Intro to Bigtable"
20090320162535 t4 "info:category" "Persistence"20090320162535
t3 "CouchDB is..."
20090320162535
t2 "info:author" "Bob Smith"
20090320162535
t1 "info:title" "Doc-oriented..."
Ask for row 20090407145045...
![Page 92: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/92.jpg)
Apache HBase(an open source Bigtable implementation)
![Page 93: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/93.jpg)
HBase uses a data model very similar to that of Bigtable. Applications store data rows in labeled tables. A data row has a sortable row key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have widely varying numbers of columns.
- http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
![Page 94: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/94.jpg)
hbase(main):001:0> create 'blog', 'info', 'content'0 row(s) in 4.3640 secondshbase(main):002:0> put 'blog', '20090320162535', 'info:title', 'Document-oriented storage using CouchDB'0 row(s) in 0.0330 secondshbase(main):003:0> put 'blog', '20090320162535', 'info:author', 'Bob Smith'0 row(s) in 0.0030 secondshbase(main):004:0> put 'blog', '20090320162535', 'content:', 'CouchDB is a document-oriented...'0 row(s) in 0.0030 secondshbase(main):005:0> put 'blog', '20090320162535', 'info:category', 'Persistence'0 row(s) in 0.0030 secondshbase(main):006:0> get 'blog', '20090320162535'COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented... 4 row(s) in 0.0140 seconds
94
HBase Shell
![Page 95: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/95.jpg)
hbase(main):015:0> get 'blog', '20090407145045', {COLUMN=>'info:author', VERSIONS=>3 }timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 secondshbase(main):016:0> scan 'blog', { STARTROW => '20090300', STOPROW => '20090400' }ROW COLUMN+CELL 20090320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20090320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20090320162535 column=info:category, timestamp=1239135042982, value=Persistence 20090320162535 column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds
95
![Page 96: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/96.jpg)
Got byte[]?
![Page 97: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/97.jpg)
// Create a new tableHBaseAdmin admin = new HBaseAdmin(new HBaseConfiguration());
HTableDescriptor descriptor = new HTableDescriptor("mytable");descriptor.addFamily(new HColumnDescriptor("family1:"));descriptor.addFamily(new HColumnDescriptor("family2:"));descriptor.addFamily(new HColumnDescriptor("family3:"));admin.createTable(descriptor);
97
![Page 98: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/98.jpg)
// Add some data into 'mytable'HTable table = new HTable("mytable");BatchUpdate update = new BatchUpdate("row1");update.put("family1:aaa", Bytes.toBytes("some value"));table.commit(update);
// Get data backRowResult result = table.getRow("row1");Cell cell = result.get("family1:aaa");
// Overwrite earlier value and add more dataBatchUpdate update2 = new BatchUpdate("row1");update2.put("family1:aaa", Bytes.toBytes("some value"));update2.put("family2:bbb", Bytes.toBytes("another value"));table.commit(update2);
98
![Page 99: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/99.jpg)
Finding data:
get (by row key)
scan (by row key ranges, filtering)
Secondary indexes allow scanning by different keys
(a bit more flexibility, requires more storage)
![Page 100: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/100.jpg)
// Scan for people born during January 1960HTable table = new HTable("fakenames");
byte[][] columns = Bytes.toByteArrays(new String[]{ "name:", "gender:" });byte[] startRow = Bytes.toBytes("19600101");byte[] endRow = Bytes.toBytes("19600201");
Scanner scanner = table.getScanner(columns, startRow, endRow);for (RowResult result: scanner) { ...}scanner.close();
100
![Page 101: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/101.jpg)
Conclusions?
![Page 102: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/102.jpg)
one size does not fit all
lots of alternatives
think about what you really need...
(not what's currently "hot")
![Page 103: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/103.jpg)
What do you really need?
distributed deployment?
fault tolerance?
query richness?
schema evolution?
extreme scalability?
ability to enforce relationships?
ACID or BASE?
key/value storage?
![Page 104: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/104.jpg)
Even more alternatives...
XML databases
Semantic Web / RDF / Triplestores
Graph databases
Tuplespaces
![Page 105: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/105.jpg)
References!
![Page 106: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/106.jpg)
GeneralPolyglot Persistencehttp://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
Database Thawhttp://martinfowler.com/bliki/DatabaseThaw.html
Application Design in the context of the shifting storage spectrumhttp://qconsf.com/sf2008/presentation/Application+Design+in+the+context+of+the+shifting+storage+spectrum
BASE: An Acid Alternativehttp://queue.acm.org/detail.cfm?id=1394128
The Challenges of Latencyhttp://www.infoq.com/articles/pritchett-latency
One size fits all: A concept whose time has come and gonehttp://www.databasecolumn.com/2007/09/one-size-fits-all.htmlhttp://www.cs.brown.edu/~ugur/fits_all.pdf
The End of an Architectural Era (It's Time for a Complete Rewrite)http://db.cs.yale.edu/vldb07hstore.pdf
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Serviceshttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495
![Page 107: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/107.jpg)
GeneralSemi-Structured Datahttp://www.dcs.bbk.ac.uk/~ptw/teaching/ssd/toc.html
Latency is Everywhere and it Costs You Sales - How to Crush ithttp://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it
QCon London 2009: Database projects to watch closelyhttp://gojko.net/2009/03/11/qcon-london-2009-database-projects-to-watch-closely
Memories, Guesses, and Apologiehttp://blogs.msdn.com/pathelland/archive/2007/05/15/memories-guesses-and-apologies.aspx
Column-oriented databaseshttp://en.wikipedia.org/wiki/Column-oriented_DBMS
Entity-Attribute-Value modelhttp://en.wikipedia.org/wiki/Entity-Attribute-Value_model
Read Consistency: Dumb Databases, Smart Serviceshttp://blog.labnotes.org/2007/09/20/read-consistency-dumb-databases-smart-services/
Neo4j graph databasehttp://neo4j.org/
NoSql web site - "Your Ultimate Guide to the Non-Relational Universe"http://nosql-database.org/
![Page 108: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/108.jpg)
Document-Oriented DatabasesDocument-Oriented Databasehttp://en.wikipedia.org/wiki/Document-oriented_database
Apache CouchDBhttp://couchdb.apache.org/
Why CouchDB?http://pmuellr.blogspot.com/2008/01/why-couchdb.html
Why CouchDB Suckshttp://www.eflorenzano.com/blog/post/why-couchdb-sucks/
Damien Katz CouchDB Interviewhttp://www.infoq.com/news/2008/11/CouchDB-Damien-Katz
CouchDB: Thinking beyond the RDBMShttp://blog.labnotes.org/2007/09/02/couchdb-thinking-beyond-the-rdbms/
CouchDB Implementationhttp://horicky.blogspot.com/2008/10/couchdb-implementation.html
Dare Takes a Look at CouchDBhttp://intertwingly.net/blog/2007/09/12/Dare-Takes-a-Look-at-CouchDB
![Page 109: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/109.jpg)
Document-Oriented DatabasesCouchDB - A Use Casehttp://kore-nordmann.de/blog/couchdb_a_use_case.html
Amazon SimpleDBhttp://aws.amazon.com/simpledb/http://en.wikipedia.org/wiki/SimpleDB
thrudb - Document Oriented Database Serviceshttp://code.google.com/p/thrudb/
thrudb - faster, cheaper than SimpleDBhttp://www.igvita.com/2007/12/28/thrudb-faster-and-cheaper-than-simpledb/
QCon 2008 track on Document-Oriented Distributed Databaseshttp://qconsf.com/sf2008/tracks/show_track.jsp?trackOID=170
![Page 110: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/110.jpg)
Distributed K-V StoresAmazon's Dynamohttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Anti-RDBMS: A list of distributed key-value storeshttp://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/http://www.reddit.com/r/programming/comments/7qv19/antirdbms_a_list_of_distributed_keyvalue_stores/
Is the Relational Database Doomed?http://developers.slashdot.org/comments.pl?sid=1127539&cid=26849641
Project Voldemorthttp://project-voldemort.com/
Project Voldemort design (also see excellent list of references from this page)http://project-voldemort.com/design.php
Consistent Hashinghttp://en.wikipedia.org/wiki/Consistent_hashing
![Page 111: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/111.jpg)
Bigtable / HBaseGoogle Architecturehttp://highscalability.com/google-architecturehttp://highscalability.com/google-architecture
Bigtable: A Distributed Storage System for Structured Datahttp://en.wikipedia.org/wiki/BigTablehttp://labs.google.com/papers/bigtable.htmlhttp://labs.google.com/papers/bigtable-osdi06.pdf
Apache HBasehttp://hadoop.apache.org/hbase/http://en.wikipedia.org/wiki/HBase
Apache Hadoophttp://hadoop.apache.org/
Understanding HBase and BigTablehttp://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
Matching Impedance: When to use HBasehttp://blog.rapleaf.com/dev/?p=26
HBase Leads Discuss Hadoop, BigTable and Distributed Databaseshttp://www.infoq.com/news/2008/04/hbase-interview
Hadoop/HBase vs RDBMShttp://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS
![Page 112: Polyglot Persistence](https://reader031.fdocuments.net/reader031/viewer/2022013009/5595d3721a28abf12b8b4724/html5/thumbnails/112.jpg)
Questions?