How Varnish & MongoDB Scale Business Insider

29
1/29/2013 New York Web Tech Scaling Scaling Business Insider by Pax Dickinson

description

Presented to the New York Web Tech Scaling Group meetup on Jan. 29th, 2013.

Transcript of How Varnish & MongoDB Scale Business Insider

Page 1: How Varnish & MongoDB Scale Business Insider

1/29/2013

New York Web Tech ScalingScaling Business Insider by Pax Dickinson

Page 2: How Varnish & MongoDB Scale Business Insider

Our Sponsors

✤ Business Insiderhttp://businessinsider.com

✤ Varnish Softwarehttp://varnish-software.com

✤ Thanks to our host, 10Gen

Page 3: How Varnish & MongoDB Scale Business Insider

Scaling Business Insider

Page 4: How Varnish & MongoDB Scale Business Insider
Page 5: How Varnish & MongoDB Scale Business Insider
Page 6: How Varnish & MongoDB Scale Business Insider
Page 7: How Varnish & MongoDB Scale Business Insider

Constraints on Scaling BI

✤ Coping with the inherently unpredictable nature of news traffic

✤ Some events bring predictably high traffic (Apple product announcements), but traffic spikes can happen anytime

Page 8: How Varnish & MongoDB Scale Business Insider

Constraints on Scaling BI

✤ Being on top of breaking news is of huge importance to us

✤ We can’t afford to have latency between CMS and production

Page 9: How Varnish & MongoDB Scale Business Insider

✤ Two different scaling strategies that apply to different use cases

✤ Backend scaling is useful for any site since it helps no matter how dynamic your content is, but it’s difficult because it involves the entire stack.

✤ Front-end scaling is useful for sites that need to deliver the same content to a huge audience.

✤ Which do we use at Business Insider? Both, of course.

Back-end vs. Front-end Scaling

Page 10: How Varnish & MongoDB Scale Business Insider

Scaling Both Ways

Page 11: How Varnish & MongoDB Scale Business Insider

✤ MongoDB is a NoSQL database, it stores documents rather than relational data and it lacks transactions.

✤ MongoDB doesn’t ensure writes by default.

✤ These choices make it fast but they need to be understood by developers using MongoDB.

MongoDB for Backend Scaling

Page 12: How Varnish & MongoDB Scale Business Insider

Business Insider Data Constraints

✤ Our data storage for the site content itself is less than 10 GB, growing a few GB a year.

✤ Images that are blog content, stored in the database using GridFS, come to another 100 or so GB, growing a few GB a month.

✤ We need to constantly record internal analytics of page views and unique visitors for business use.

✤ We need updates to be reflected immediately.

✤ Our architecture needs to allow for exponential transaction volume growth.

Page 13: How Varnish & MongoDB Scale Business Insider

Business Insider & MongoDB

✤ The blog (including images) is stored on DB1.

✤ Analytics are written to DB3 so the analytics write locks don’t affect blog performance.

✤ A shared slave is ready to step induring a failure of either server.

✤ All transactions are performedagainst the primaries.

Page 14: How Varnish & MongoDB Scale Business Insider

Business Insider & MongoDB

✤ As we grow, we’ll plan to move GridFS off the blog server and onto its own server & slave.

✤ The blog and analytics can be sharded for performance when that’seventually needed.

✤ Our DB servers are dual quad-coreXeons with 64GB of RAM and SSDdata storage

✤ We’re handling 800-1000 ops/sec with negligible CPU load.

Page 15: How Varnish & MongoDB Scale Business Insider

Data Modeling for Scaling

✤ MongoDB allows the storage of embedded documents within a document. We use this to store an array of comments within the blog post document they belong to.

✤ This eliminates the costly joins traditionalSQL would require. In order to provide a moderation interface displaying all recent comments, we de-normalize that data and double store it.

Page 16: How Varnish & MongoDB Scale Business Insider

Varnish for Front-end Caching

✤ Varnish is a front-end caching reverse proxy.

✤ Varnish retrieves web pages on behalf of clients and caches the result. Clients that can be served from cache don’t add any load to your backend.

Page 17: How Varnish & MongoDB Scale Business Insider

How Varnish Works

Page 18: How Varnish & MongoDB Scale Business Insider

Varnish & Business Insider

✤ We use two Varnish servers with single quad-core processors and 32GB of RAM each. They are randomly load balanced between and each store a full 24GB RAM cache of the site.

✤ Average weekdays peak around 700 reqs/sec on each Varnish server, spiking to over 1500 reqs/sec during breaking news such as Apple quarterly earnings.

✤ Our four backend Apache/PHP servers tend to see 50-60 reqs/s, and during breaking news this only spikes to 70-80 reqs/s.

Page 19: How Varnish & MongoDB Scale Business Insider

Varnish Active Bans

✤ When an editor publishes or edits a post, a ban request is sent to the Varnish servers.

✤ They add post pages, vertical pages, and author pages associated with that post to the Varnish ban list.

✤ The next time a request matches that list, the content is retrieved fresh from the backend and the cache is refreshed.

✤ This lets us cache but still keep our content totally up-to-date.

Page 20: How Varnish & MongoDB Scale Business Insider

Varnish Active Bans

Page 21: How Varnish & MongoDB Scale Business Insider

Edge Side Includes

✤ Varnish allows for the use of Edge Side Includes, parts of a page that are retrieved separately and have different cache lifetimes.

✤ One example is the “Most Read” widget inthe right rail on Business Insider.

✤ The page hosting the module may be cached for an hour but the Edge Side Include hosting the widget has a 5 minuteTTL, keeping it up to date.

Page 22: How Varnish & MongoDB Scale Business Insider

Edge Side Includes

Another example is the top user menu. This is an Edge Side Include that includes the logged in user’s ID in the hash, so each user gets this block customized for them while the rest of the page can be cached more generally.

Varnish allows you full programmatic control of what in included in the hash for each request, so complex tricks like this are possible.

Page 23: How Varnish & MongoDB Scale Business Insider

Edge Side Includes

Edge Side Includes are a common caching standard, this is the same format used by Akamai and other CDN providers.

Generally you should add a header tag to any page including an ESI. You need to tell Varnish to process your ESI tags and if you run that processing on every payload you’ll waste resources and risk corrupting any binary or image that happens to contain the sequence “<esi:”.

Page 24: How Varnish & MongoDB Scale Business Insider

Scaling Varnish Servers

✤ The load balancer randomly sends traffic to the Varnish servers.

✤ Each Varnish server caches every page.

✤ Every cache ban results in two backend requests, one from each Varnish server.

Page 25: How Varnish & MongoDB Scale Business Insider

Scaling Varnish Servers

✤ Adding a third Varnish server means a third backend request for every ban.

✤ That defeats our purpose entirely!

Page 26: How Varnish & MongoDB Scale Business Insider

Scaling Varnish Servers

✤ We can solve this a few ways, but we’ll use Layer 7 load balancing.

✤ We can send a subset of the URL space to each Varnish server

✤ Only one copy of each cached page will exist on the cluster, reducing load on the backend.

Page 27: How Varnish & MongoDB Scale Business Insider

A Closing Testimonial From Jay-Z

✤ “If you’re having scaling problems,I feel bad for you son...

Clients sent 99 requests but my backend got one.”

Photo by flickr user matthew_harrison

Page 28: How Varnish & MongoDB Scale Business Insider

Til Next Time...

✤ Feb. 28th, 2013(tentative)

✤ Topic TBD

Page 29: How Varnish & MongoDB Scale Business Insider

Sources & Links

✤ MongoDBhttp://www.mongodb.org

✤ Varnish Cachehttp://www.varnish-cache.org