MongoDB San Francisco 2013: Using MongoDB for Groupon's Place Data presented by Peter Bakkum,...
-
Upload
mongodb -
Category
Technology
-
view
1.400 -
download
0
Transcript of MongoDB San Francisco 2013: Using MongoDB for Groupon's Place Data presented by Peter Bakkum,...
Arnold: Declarative Crowd-Machine
Data IntegrationShawn Jeffery, Liwen Sun, Matt DeLand,
Nick Pendar, Rick Barber, Andrew Galdi
CIDR 2013cidrdb.org/cidr2013/Papers/CIDR13_Paper22.pdf
Concordance
{
name: “Joe’s Pizza”,
location: {
address: “1000 Market St.”,
postal_code: “94100-1001”
},
source: 1
}
{
name: “Joes Pizza”,
location: {
address: “1000 Market Street”,
postal_code: “94100”
},
source: 2
}
{
name: “Joes”,
location: {
address: “1000 Market”,
postal_code: “94100”
},
source: 3
}
places.find({
_id: “013e4e2afc26”
})
placeCollection.find({
location.postcode: “94100”,
location.country: “US”
})
places.findAndModify(
{
_id: “013e4e2afc26”
persisted_at: “2013-02-01T0:00:00Z”
},
{ place model })
Concordance
Persistence
config cluster
4 arbiters
4 shards of 2 nodes
replica set failover
64 GB dedicated hardware
storm workers
mongos routers
UUID v1
82d991c6-b098-11e2-8fc0-c82a14fffe86
82d9996e-b098-11e2-8fc0-c82a14fffe86
82d99f04-b098-11e2-8fc0-c82a14fffe86
82d9a40e-b098-11e2-8fc0-c82a14fffe86
UUID vB
wwwwwwww-xxxx-byyy-yyyy-zzzzzzzzzzzz
w: controllable counter
x: process id
b: literal 'b’
y: fragment of MAC address
z: milliseconds since epoch (UTC)
toggle
c8c9cef9-7a7f-bd53-7a50-013e4e2afbde 14951cfa-7a7f-bd53-7a50-013e4e2afbde 6f5169fb-7a7f-bd53-7a50-013e4e2afbde ba2da6fc-7a7f-bd53-7a50-013e4e2afbde
f5166777-7a7f-bd53-7a50-013e4e2afc26 f5166778-7a7f-bd53-7a50-013e4e2afc26 f5166779-7a7f-bd53-7a50-013e4e2afc26 f516677a-7a7f-bd53-7a50-013e4e2afc26
places.ns
places.0
places.1
places.2
…char[128]
name
DiskLoc
firstExtent
DiskLoc
lastExtent
places.ns
places.0
places.1
places.2
…
places.ns
places.0
places.1
places.2
…
places.0 places.1
extent extent extent
MapReduceInput Split
MapReduceInput Split
MapReduceInput Split
public void map(
Text key,
WritableBSONObject value,
Context context)
{
String id = (String) value.get(“_id”);
...
}
Mongo Cluster
Hadoop Cluster
MapReduce Job
Backs up Mongo data to
Hadoop
Much faster data export
Exploits our Hadoop cluster