Developing node-mdb: a Node.js - based clone of SimpleDB
-
Upload
rob-tweed -
Category
Technology
-
view
1.366 -
download
2
description
Transcript of Developing node-mdb: a Node.js - based clone of SimpleDB
Developing node-mdb
SimpleDB emulationusing Node.js and GT.M
Rob TweedM/Gateway Developments Ltd
http://www.mgateway.comTwitter: @rtweed
Could you translate that title?
• SimpleDB:– Amazon’s NoSQL cloud database
• Node.js:– evented server-side Javascript (using V8)
• GT.M:– Open source global-storage based NoSQL
database
• node-mdb– Open source emulation of SimpleDB
SimpleDB
• Amazon’s cloud database– Pay as you go
• Secure HTTP interface• Schema-free NoSQL database• Spreadsheet-like database model
– Domains (= tables)• Items (= rows)
– Attributes (=cells)
» Values (1+ per attribute allowed)
• SQL-like query API
Why emulate SimpleDB?
• Because I could!
• Kind of cool project
Why emulate SimpleDB?
• To provide a free, locally-available database that behaved identically to SimpleDB– Lots of off-the-shelf available clients
• Standalone– Bolso
– Mindscape’s SimpleDB Management Tools
• Language-specific clients– boto (Python)
– Official AWS clients for Java, .Net
– Node.js
– etc…
Why emulate SimpleDB?
• To perform local tests prior to committing to production on SimpleDB
• To provide a live, local backup database
• A SimpleDB database for private clouds
• To provide an immediately-consistent SimpleDB database– SimpleDB is “eventually consistent”
Why the GT.M database?• I’m familiar with it• Free Open Source NoSQL database• Schema-free• “Globals”:
– Sparse persistent multi-dimensional arrays• Hierarchical database• Completely dynamic storage
– No pre-declaration or specification needed
• Result: trivial to model SimpleDB in globals
• node-mdb: Good way to demonstrate the capabilities of the otherwise little-known GT.M
• More info – Google:– “GT.M database”– “universalnosql”
Why write it using Node.js?
• M/DB originally written in late 2008– Implemented using GT.M’s native scripting language
(M)– Apache + m_apache gateway to GT.M for HTTP
interface
• I’ve been working with Node.js for about a year now– Rewriting M/DB in Javascript would make it more
widely interesting and comprehensible
• Some performance issues reported with M/DB when being pushed hard
Why Node.js?
• Conclusion:– Re-implementing M/DB using Node.js should
provide better performance and scalability– Fewer moving parts:
• Apache + m_apache + GT.M / multi-threaded• Node.js + GT.M as child processes / single-thread
– Cool Node.js project to attempt– Great example of non-trivial use of Node.js +
database
How does SimpleDB work?
HTTPServer
AuthenticateRequest
(HMacSHA)
Security Key IdSecret Key
ExecuteAPI
Action
GenerateHTTP
Response
SimpleDBDatabaseCopy 1
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy n
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy 2
IncomingSDB
HTTPRequest
OutgoingSDB
HTTPResponse
Error Successand/or
data/results
Node.js can emulate all this
HTTPServer
AuthenticateRequest
(HMacSHA)
Security Key IdSecret Key
ExecuteAPI
Action
GenerateHTTP
Response
SimpleDBDatabaseCopy 1
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy n
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy 2
IncomingSDB
HTTPRequest
OutgoingSDB
HTTPResponse
Error Successand/or
data/results
GT.M can emulate this
HTTPServer
AuthenticateRequest
Security Key IdSecret Key
ExecuteAPI
Action
GenerateHTTP
Response
SimpleDBDatabaseCopy 1
IncomingSDB
HTTPRequest
OutgoingSDB
HTTPResponse
Error Successand/or
data/results
Node.js characteristics
• Single threaded process
• Event loop
• Non-blocking I/O– Asynchronous calls to functions that handle I/O– Event-driven call-back functions when function
completes• Data fetched• Data saved
Result: deeply nested call-backs
HTTPServer
AuthenticateRequest
Security Key IdSecret Key
ExecuteAPI
Action
GenerateHTTP
Response
Error Successand/or
data/results
Flattening the call-back nesting
processSDBRequest()
http server
executeAPI() sendResponse()
http.createServer(function(req,res) {..}
var processSDBRequest = function() {…};
var executeAPI = function() {…};
Node.js HTTP Serverhttp.createServer(function(request, response) { request.content = ''; request.on("data", function(chunk) { request.content += chunk; }); request.on("end", function(){ var SDB = {startTime: new Date().getTime(), request: request, response: response }; var urlObj = url.parse(request.url, true); if (request.method === 'POST') { SDB.nvps = parseContent(request.content); } else { SDB.nvps = urlObj.query; } var uri = urlObj.pathname; if ((uri.indexOf(sdbURLPattern) !== -1)||(uri.indexOf(mdbURLPattern) !== -1)) { processSDBRequest(SDB); } else { var uriString = 'http://' + request.headers.host + request.url; var error = {code:'InvalidURI', message: 'The URI ' + uriString + ' is not valid',status:400}; returnError(SDB ,error); } });}).listen(httpPort);
processSDBRequest()var processSDBRequest = function(SDB) { var accessKeyId = SDB.nvps.AWSAccessKeyId; if (!accessKeyId) { var error = {code:'AuthMissingFailure', message: 'AWS was not able to authenticate the request: access credentials are missing',status:403}; returnError(SDB, error); } else { MDB.getGlobal('MDBUAF', ['keys', accessKeyId], function (error, results) { if (!error) { if (results.value !== '') { accessKey[accessKeyId] = results.value; validateSDBRequest(SDB, results.value); } else { var error = {code:'AuthMissingFailure', message: 'AWS was not able to authenticate the request: access credentials are missing',status:403}; returnError(SDB, error); } } }); }};
validateSDBRequest()
var validateSDBRequest = function(SDB, secretKey) { var type = ‘HmacSHA256’; var stringToSign = createStringToSign(SDB, true); var hash = digest(stringToSign, secretKey, type); if (hash === SDB.nvps.Signature) { processSDBAction(SDB); } else { errorResponse('SignatureDoesNotMatch', SDB) }};
stringToSign()
POST{lf}192.168.1.134:8081{lf}/{lf}AWSAccessKeyId=rob&Action=ListDomains& MaxNumberOfDomains=100&SignatureMethod=HmacSHA1& SignatureVersion=2& Timestamp=2011-06-06T22%3A39%3A30%2 B00%3A00& Version=2009-04-15
ie: reconstruct the same string that the SDB client used to sign the request
then use rob’s secret key to sign it:
digest()
var crypto = require("crypto");
var digest = function(string, secretKey, type) { var hmac = crypto.createHmac(type, secretKey); hmac.update(string); return hmac.digest('base64');};
Ready to execute an API!
HTTPServer
AuthenticateRequest
Security Key IdSecret Key
ExecuteAPI
Action
GenerateHTTP
Response
SimpleDBDatabaseCopy 1
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy n
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy 2
IncomingSDB
HTTPRequest
OutgoingSDB
HTTPResponse
Error Successand/or
data/results
SimpleDB APIs (Actions)
• CreateDomain• ListDomains• DeleteDomain• PutAttributes (BatchPutAttributes)• GetAttributes• DeleteAttributes (BatchDeleteAttributes)• Select• DomainMetaData
Accessing the GT.M Database
• Accessed via node-mwire– TCP-based wire protocol– Extension of Redis protocol– Adapted redis-node module
• APIs allow you to set/get/delete/edit Globals
GT.M Globals
• Globals = unit of persistent storage– Schema-free– Hierarchically structured– Sparse– Dynamic
– “persistent associative array”
GT.M Globals
• A Global has:– A name– 0, 1 or more subscripts– String value
globalName[subscript1,subscript2,..subscriptn]=value
SDB Domain in GlobalsCreateDomain AWSAccessKeyId = ‘rob’ DomainName = ‘books’
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
‘name’
‘created’ 1304956337423
‘accounts’
‘modified’ 1304956337423
2
‘accounts’ 2 ‘’
Multiple Domains in Globals
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
2
Creating a new domain (1)
increment()
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
‘name’
‘created’ 1304956337423
‘accounts’
‘modified’ 1304956337423
2
‘accounts’ 2 ‘’
Creating a new domain (2)
setGlobal()
Key Node.js async patterns for db I/O
• Dependent pattern:– Can’t set the global nodes until the value of
the increment() is returned
• Parallel pattern:– Global nodes can be created in parallel– No interdependence– BUT:
• Need to know when they’re all completed
MDB ‘rob’
‘domains’
‘name’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
1
2
Dependent pattern
MDB.increment([accessKeyId, 'domains'], 1, function (error, results) { var id = results.value; //….now create the other global nodes inside callback});
IncrBy
MDB ‘rob’
‘domains’
‘name’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
1
2
Dependent pattern
MDB.increment([accessKeyId, 'domains'], 1, function (error, results) { var id = results.value; //….now create the other global nodes inside callback});
Parallel Pattern (semaphore) var count = 0; MDB.setGlobal([accessKeyId, 'domains', id, 'name'], domainName, function (error, results) { count++; if (count === 4) sendCreateDomainResponse(count, SDB); }); MDB.setGlobal([accessKeyId, 'domains', id, 'created'], now, function (error, results) { count++;
if (count === 4) sendCreateDomainResponse(count, SDB); }); MDB.setGlobal([accessKeyId, 'domains', id, 'modified'], now, function (error, results) { count++; if (count === 4) sendCreateDomainResponse(count, SDB); }); MDB.setGlobal([accessKeyId, 'domainIndex', nameIndex, id], '', function (error, results) { count++; if (count === 4) sendCreateDomainResponse(count, SDB); });
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
‘name’
‘created’ 1304956337423
‘accounts’
‘modified’ 1304956337423
2
‘accounts’ 2 ‘’
New domain nodes created
Send CreateDomain Response
HTTPServer
AuthenticateRequest
Security Key IdSecret Key
ExecuteAPI
Action
GenerateHTTP
Response
SimpleDBDatabaseCopy 1
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy n
SimpleDBDatabaseCopy 2
SimpleDBDatabaseCopy 2
IncomingSDB
HTTPRequest
OutgoingSDB
HTTPResponse
Error Successand/or
data/results
CreateDomain Response<?xml version="1.0"?><CreateDomainResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"> <ResponseMetadata> <RequestID>e4e9fa45-f9dc-4e5b-8f0a-777acce6505e</RequestID> <BoxUsage>0.0020000000</BoxUsage> </ResponseMetadata></CreateDomainResponse>
var okResponse = function(SDB) { var nvps = SDB.nvps; var xml = responseStart({action: nvps.Action, version: nvps.Version}); xml = xml + responseEnd(nvps.Action, SDB.startTime, false); responseHeader(200, SDB.response); SDB.response.write(xml); SDB.response.end();};
Node.js HTTP Server Response
http.createServer(function(request, response) { //…numerous call-backs deep:
response.writeHead(status, { "Server": "Amazon SimpleDB", "Content-Type": "text/xml", "Date": dateNow.toUTCString()}); response.write('<?xml version="1.0"?>\n'); response.write(xml); response.end();
});
Entire request/response SDB round-trip completed
Demo using Bolso
• List Domains
• Create Domain
• Add an item (row) and some attributes (columns + cells)
Node.js Gotchas
• Async programming is not immediately intuitive!
• Loops– Calling functions that use call-backs inside a
for..in loop will go horribly wrong!
• Understanding closures– How externally-defined variables can be used
inside call-back functions
Example
• BatchPutAttributes– Intuitively a for .. in loop around PutAttributes– Had to be serialised
• Completion of one PutAttributes calls the next
– Copy state of SDB object and use for..in?• var SDBx = SDB;• SDBx is a pointer to SDB, not a clone of it!
Conclusions• node-mdb is now nearly complete• Only BatchDeleteAttributes not implemented• Other APIs emulate SimpleDB 100%• Free Open Source
– https://github.com/robtweed/node-mdb– Give it a try!– Use mdb.js for examples to build your own Node.js database
applications• Check out GT.M!
• Follow me on Twitter at @rtweed
• Slides: http://www.mgateway.com/node-mdb-pres.html