Modeling for Performance
-
Upload
mongodb -
Category
Technology
-
view
579 -
download
0
description
Transcript of Modeling for Performance
Data Modeling for Performance
Mongo BoulderJanuary 21, 2010
Michael DwanSnapjoy
i’m michael dwan@michaeldwan on the twitter
the projectCompany X
application spec
• find business details (web + api)
• search by category/keyword + geo (web + api)
• update (api)
why is this interesting?
15,000,000businesses
30,000partners
100,000geo areas
2,300categories
2,000,000requests daily
24,000,000urls in sitemap
100,000,000tags
updates
• infrequent changes
• monthly updates w/ 12M monthly changes
• “zero downtime”
the problemmo’ data, mo’ problems
complexity
businesses
phone_numbers
businesses _phone_numbers
cities
states
zips
neighborhoods
businesses_neighborhoods
tags
taggings
assets
users
categories
categorizations
providers mappings
architecture
x
xx x
read performance
solr
downtime
solr getting fussy
migrations
downtime
the solution
> gem install acts_as_web_scale
a business...
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",}
a business... has many phone numbers
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",}
a business... has many phone numbers
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]}
a business... has coordinates
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]}
a business... has coordinates
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]}
a business... has many tags
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]}
a business... has many tags
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]}
a business... has an address
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]}
a business... has an address
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }}
belongs to?
a state
{ "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ...}
a business... belongs to a state
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }}
a business... belongs to a state
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }}
a business... belongs to a state
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }}
a business... belongs to a city
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }}
a business... belongs to a city
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }}
a business... belongs to a zip code
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }}
a business... belongs to a zip code
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }}
many-to-many?
a category
{ "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ...}
a business... belongs to a zip code
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }}
a business... belongs to many categories
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }}
a business... belongs to many categories
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ]}
queries & indexesknow what you want
#1 find a businessI want *that* one
find a business
// single businessdb.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9")})
#2 find by locationBusinesses in San Francisco, CA
find businesses by state/city/zip
// find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})
find businesses by state/city/zip
// find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})
// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")})
find businesses by state/city/zip
// find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})
// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")})
// find all within zipdb.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")})
indexes
// the indexesdb.businesses.ensureIndex({"location.city._id": 1})db.businesses.ensureIndex({"location.zip._id": 1})
skip “location.state._id” -- only 51 possibilities
1.5GBeach
#3 find by categoryBusinesses in the Auto Repair category
businesses by category
// find by category iddb.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")})
// the indexdb.businesses.ensureIndex({ "categories._id":1})
#4 - find by category + location Businesses in the Plumbing category in Chicago, IL
businesses by category + city
// find by city id and category iddb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")})
which index should we use?
// city id{"location.city._id":1}
// category id{"categories._id":1}
~ or ~
we need a compound indexanswer: both suck
which order?
db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1})
db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1})
~ or ~
answer: cities ! categories
35,000 cities & 2,500 categories
create one for zip codes and categories too!
don’t we have 2 indexes on city id?
answer: yes
{"location.city._id" : 1}{"location.city._id" : 1, "categories._id" : 1}
db.businesses.dropIndex("location.city._id_1")
#5 - find by keyword“something awesome” in Boulder, CO
find businesses in city by keyword
{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ]}
db.businesses.ensureIndex({ "location.city._id":1, "keywords":1})
db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i})
chat with Kyle Banker
me: we’re switching from postgres+solr to mongo
kyle: oh wow, you can replace solr with mongo?
me: with some creativity
kyle: seems like it’d still be hard to get just right
me: it works well
kyle: gotcha
i was wrong, kyle was right
I’ll never leave you again
...until MongoDB supports full text later this year:)
I
aggregationmap/reduce to the rescue
sitemapsbig list of every url
sitemaps
• xml files containing each unique url ~ 24M
• 50,000 urls per file, about 500 files
• urls are generated from live data
• http://companyx.com/sitemaps/1.xml
partition by consistent hash
>> "hello!".hash % 6 #=> 5
>> "/ny/new-york/c/apartments".hash % 6 #=> 5
returns an integer between 0 and the number specified
map/reduce
1. map each url in the site to a partition
2. reduce all partitions to a single document containing all urls in that partition
3. save to a permanent collection
map
/il/chicago/c/pizza 4/ny/new-york/c/apartments 1nd/rugby/c/apartments 6/14076500-bayside-marina 2/13401000-comtrak-logistics-inc 3/12347500-allstate-auto-insurance 1il/downers-grove/c/computer-web-design 6/1009500-heidelberg-lodges 5mn/redwood-falls/c/food-service 4/14077000-bank-of-america 5mn/savage/c/audio-visual-equipment 1...
1
2
3
4
5
6
reduce
{ "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/ny/new-york/c/apartments" ]}
{ "total" : 1, "urls" : [ "/mn/savage/c/audio-visual-equipment" ]}
{ "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] }}
usage
db.sitemaps.findOne({_id:1}).value.urls
[ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments"]
wrap up
2 months later
115ms average response times
thank you@michaeldwan