Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco...

123
Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Transcript of Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco...

Page 1: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Scaling Django Web AppsMike Malone

san francisco meetupTuesday, May 26, 2009

Page 2: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

... & some Django Patterns & Best Practices

Mike Malone

san francisco meetupTuesday, May 26, 2009

Page 3: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Hi, I’m Mike.

Tuesday, May 26, 2009

Page 4: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Tuesday, May 26, 2009

Page 5: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Tuesday, May 26, 2009

Page 6: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

http://www.flickr.com/photos/kveton/2910536252/Tuesday, May 26, 2009

Page 7: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Tuesday, May 26, 2009

Page 8: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Pownce

• Large scale

• Hundreds of requests/sec

• Thousands of DB operations/sec

• Millions of user relationships

• Millions of notes

• Terabytes of static data

8

Tuesday, May 26, 2009

Page 9: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Pownce

• Encountered and eliminated many common scaling bottlenecks

• Real world example of scaling a Django app

• Django provides a lot for free

• I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way

9

Tuesday, May 26, 2009

Page 10: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Scalability

Tuesday, May 26, 2009

Page 11: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Scalability

11

• Speed / Performance

• Generally affected by language choice

• Achieved by adopting a particular technology

Scalability is NOT:

Tuesday, May 26, 2009

Page 12: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

import time

def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',)

A Scalable Application

12

Tuesday, May 26, 2009

Page 13: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + "\n") f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines()

if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),)

A High Performance Application

13

Tuesday, May 26, 2009

Page 14: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Scalability

14

A scalable system doesn’t need to change when the size of the problem changes.

Tuesday, May 26, 2009

Page 15: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Scalability

• Accommodate increased usage

• Accommodate increased data

• Maintainable

15

Tuesday, May 26, 2009

Page 16: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Scalability

• Two kinds of scalability

• Vertical scalability: buying more powerful hardware, replacing what you already own

• Horizontal scalability: buying additional hardware, supplementing what you already own

16

Tuesday, May 26, 2009

Page 17: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Vertical Scalability

• Costs don’t scale linearly (server that’s twice is fast is more than twice as much)

• Inherently limited by current technology

• But it’s easy! If you can get away with it, good for you.

17

Tuesday, May 26, 2009

Page 18: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Vertical Scalability

18

Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build!

- Cal Henderson

Tuesday, May 26, 2009

Page 19: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Horizontal Scalability

19

The ability to increase a system’s capacity by adding more processing units (servers)

Tuesday, May 26, 2009

Page 20: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Horizontal Scalability

20

It’s how large apps are scaled.

Tuesday, May 26, 2009

Page 21: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Horizontal Scalability

• A lot more work to design, build, and maintain

• Requires some planning, but you don’t have to do all the work up front

• You can scale progressively...

• Rest of the presentation is roughly in order if you’re scaling as you go...

21

Tuesday, May 26, 2009

Page 22: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Caching

Tuesday, May 26, 2009

Page 23: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

• Several levels of caching available in Django

• Per-site cache: caches every page that doesn’t have GET or POST parameters

• Per-view cache: caches output of an individual view

• Template fragment cache: caches fragments of a template

• None of these are that useful if pages are heavily personalized

23

Tuesday, May 26, 2009

Page 24: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

• Low-level Cache API

• Much more flexible, allows you to cache at any granularity

• At Pownce we typically cached

• Individual objects

• Lists of object IDs

• Hard part is invalidation

24

Tuesday, May 26, 2009

Page 25: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

• Cache backends:

• Memcached

• Database caching

• Filesystem caching

25

Tuesday, May 26, 2009

Page 26: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

26

Use Memcache.

Tuesday, May 26, 2009

Page 27: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Sessions

27

Use Memcache.

Tuesday, May 26, 2009

Page 28: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Sessions

28

Or Tokyo Cabinethttp://github.com/ericflo/django-tokyo-sessions/

Thanks @ericflo

Tuesday, May 26, 2009

Page 29: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

from django.core.cache import cache

class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles

Caching

29

Basic caching comes free with Django:

Tuesday, May 26, 2009

Page 30: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

from django.core.cache import cachefrom django.db.models import signals

def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key)

signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile)signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile)

Caching

30

Invalidate when a model is saved or deleted:

Tuesday, May 26, 2009

Page 31: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

31

• Invalidate post_save, not pre_save

• Still a small race condition

• Simple solution, worked for Pownce:

• Instead of deleting, set the cache key to None for a short period of time

• Instead of using set to cache objects, use add, which fails if there’s already something stored for the key

Tuesday, May 26, 2009

Page 32: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

32

Tuesday, May 26, 2009

Page 33: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

33

Tuesday, May 26, 2009

Page 34: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

34

• Memcached’s atomic increment and decrement operations are useful for maintaining counts

• But they’re not available in Django 1.0

• Added in 1.1 by ticket #6464

Tuesday, May 26, 2009

Page 35: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

35

• You can still use them if you poke at the internals of the cache object a bit

• cache._cache is the underlying cache object

try: result = cache._cache.incr(cache_key, delta)except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result.return result

Tuesday, May 26, 2009

Page 36: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

36

• Other missing cache API

• delete_multi & set_multi

• append: add data to existing key after existing data

• prepend: add data to existing key before existing data

• cas: store this data, but only if no one has edited it since I fetched it

Tuesday, May 26, 2009

Page 37: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

37

• It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them)

• User and UserProfile

• fetched almost every request

• rarely change

• But Django won’t let you

• IMO, this is a bug :(

Tuesday, May 26, 2009

Page 38: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout)

The Memcache Backend

38

Tuesday, May 26, 2009

Page 39: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout)

The Memcache Backend

39

Tuesday, May 26, 2009

Page 40: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

40

• Typical setup has memcached running on web servers

• Pownce web servers were I/O and memory bound, not CPU bound

• Since we had some spare CPU cycles, we compressed large objects before caching them

• The Python memcache library can do this automatically, but the API is not exposed

Tuesday, May 26, 2009

Page 41: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

from django.core.cache import cachefrom django.utils.encoding import smart_strimport inspect as i

if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass

Monkey Patching core.cache

41

Tuesday, May 26, 2009

Page 42: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

42

• Useful tool: automagic single object cache

• Use a manager to check the cache prior to any single object get by pk

• Invalidate assets on save and delete

• Eliminated several hundred QPS at Pownce

Tuesday, May 26, 2009

Page 43: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

43

All this and more at:

http://github.com/mmalone/django-caching/

Tuesday, May 26, 2009

Page 44: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Advanced Caching

• Consistent hashing: hashes cached objects in such a way that most objects map to the same node after a node is added or removed.

44

http://www.flickr.com/photos/deepfrozen/2191036528/

Tuesday, May 26, 2009

Page 45: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Consistent Hashing

45

Tuesday, May 26, 2009

Page 46: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Consistent Hashing

46

Tuesday, May 26, 2009

Page 47: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Consistent Hashing

47

hash_ring on PyPi

Tuesday, May 26, 2009

Page 48: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Caching

48

Now you’ve made life easier for your DB server,next thing to fall over: your app server.

Tuesday, May 26, 2009

Page 49: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Load Balancing

Tuesday, May 26, 2009

Page 50: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

• Out of the box, Django uses a shared nothing architecture

• App servers have no single point of contention

• Responsibility pushed down the stack (to DB)

• This makes scaling the app layer trivial: just add another server

50

Tuesday, May 26, 2009

Page 51: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

51

App Servers

Database

Load Balancer

Spread work between multiple nodes in a cluster using a load balancer.

• Hardware or software• Layer 7 or Layer 4

Tuesday, May 26, 2009

Page 52: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

52

• Hardware load balancers

• Expensive, like $35,000 each, plus maintenance contracts

• Need two for failover / high availability

• Software load balancers

• Cheap and easy, but more difficult to eliminate as a single point of failure

• Lots of options: Perlbal, Pound, HAProxy, Varnish, Nginx

Tuesday, May 26, 2009

Page 53: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

53

• Most of these are layer 7 load balancers, and some software balancers do cool things

• Caching

• Re-proxying

• Authentication

• URL rewriting

Tuesday, May 26, 2009

Page 54: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

54

A common setup for large operations is to use redundant layer 4 hardware balancers in front of a pool of layer 7 software balancers.

Hardware Balancers

Software Balancers

App Servers

Tuesday, May 26, 2009

Page 55: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

55

• At Pownce, we used a single Perlbal balancer

• Easily handled all of our traffic (hundreds of simultaneous connections)

• A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines

• Plus there were some neat features that we took advantage of

Tuesday, May 26, 2009

Page 56: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Perlbal Reproxying

56

Perlbal reproxying is a really cool, and really poorlydocumented feature.

Tuesday, May 26, 2009

Page 57: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Perlbal Reproxying

57

1. Perlbal receives request

2. Redirects to App Server

1. App server checks auth (etc.)

2. Returns HTTP 200 with X-Reproxy-URL header set to internal file server URL

3. File served from file server via Perlbal

Tuesday, May 26, 2009

Page 58: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Perlbal Reproxying

• Completely transparent to end user

• Doesn’t keep large app server instance around to serve file

• Users can’t access files directly (like they could with a 302)

58

Tuesday, May 26, 2009

Page 59: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response

Perlbal Reproxying

59

Plus, it’s really easy:

Tuesday, May 26, 2009

Page 60: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Load Balancing

60

Best way to reduce load on your app servers: don’t use them to do hard stuff.

Tuesday, May 26, 2009

Page 61: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Queuing

Tuesday, May 26, 2009

Page 62: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Queuing

• A queue is simply a bucket that holds messages until they are removed for processing by clients

• Many expensive operations can be queued and performed asynchronously

• User experience doesn’t have to suffer

• Tell the user that you’re running the job in the background (e.g., transcoding)

• Make it look like the job was done real-time (e.g., note distribution)

62

Tuesday, May 26, 2009

Page 63: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Queuing

• Lots of open source options for queuing

• Ghetto Queue (MySQL + Cron)

• this is the official name.

• Gearman

• TheSchwartz

• RabbitMQ

• Apache ActiveMQ

• ZeroMQ

63

Tuesday, May 26, 2009

Page 64: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Queuing

• Lots of fancy features: brokers, exchanges, routing keys, bindings...

• Don’t let that crap get you down, this is really simple stuff

• Biggest decision: persistence

• Does your queue need to be durable and persistent, able to survive a crash?

• This requires logging to disk which slows things down, so don’t do it unless you have to

64

Tuesday, May 26, 2009

Page 65: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Queuing

• Pownce used a simple ghetto queue built on MySQL / cron

• Problematic if you have multiple consumers pulling jobs from the queue

• No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from

65

Tuesday, May 26, 2009

Page 66: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

from django.core.management import setup_environfrom mysite import settings

setup_environ(settings)

Django Standalone Scripts

66

Consumers need to setup the Django environment

Tuesday, May 26, 2009

Page 67: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Django Standalone Scripts

67

Great blog post by James Bennett (@ubernostrum)

http://bit.ly/django-standalone-scripts

Tuesday, May 26, 2009

Page 68: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

THE DATABASE!

Tuesday, May 26, 2009

Page 69: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

The Database

• Til now we’ve been talking about

• Shared nothing

• Pushing problems down the stack

• But we have to store a persistent and consistent view of our application’s state somewhere

• Enter, the database...

69

Tuesday, May 26, 2009

Page 70: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

CAP Theorem

• Three properties of a shared-data system

• Consistency: all clients see the same data

• Availability: all clients can see some version of the data

• Partition Tolerance: system properties hold even when the system is partitioned & messages are lost

• But you can only have two

70

Tuesday, May 26, 2009

Page 71: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

CAP Theorem

• Big long proof... here’s my version.

• Empirically, seems to make sense.

• Eric Brewer

• Professor at University of California, Berkeley

• Co-founder and Chief Scientist of Inktomi

• Probably smarter than me

71

Tuesday, May 26, 2009

Page 72: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

CAP Theorem

• The relational database systems we all use were built with consistency as their primary goal

• But at scale our system needs to have high availability and must be partitionable

• The RDBMS’s consistency requirements get in our way

• Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance

72

Tuesday, May 26, 2009

Page 73: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

The Database

• There are lots of non-relational databases coming onto the scene

• CouchDB

• Cassandra

• Tokyo Cabinet

• But they’re not that mature, and they aren’t easy to use with Django

73

Tuesday, May 26, 2009

Page 74: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

The Database

• Django has no support for

• Non-relational databases like CouchDB

• Multiple databases (coming soon?)

• If you’re looking for a project, plz fix this.

• Only advice: don’t get too caught up in trying to duplicate the existing ORM API

74

Tuesday, May 26, 2009

Page 75: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

I Want a Pony

• Save always saves every field of a model

• Causes unnecessary contention and more data transfer

• A better way:

• Use descriptors to determine what’s dirty

• Only update dirty fields when an object is saved

75

Tuesday, May 26, 2009

Page 76: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Denormalization

Tuesday, May 26, 2009

Page 77: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Denormalization

• Django encourages normalized data, which is usually good

• But at scale you need to denormalize

• Corollary: joins are evil

• Django makes it really easy to do joins using the ORM, so pay attention

77

Tuesday, May 26, 2009

Page 78: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Denormalization

• Start with a normalized database

• Selectively denormalize things as they become bottlenecks

• Denormalized counts, copied fields, etc. can be updated in signal handlers

78

Tuesday, May 26, 2009

Page 79: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Replication

Tuesday, May 26, 2009

Page 80: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Replication

• Typical web app is 80 to 90% reads

• Adding read capacity will get you a long way

• MySQL Master-Slave replication

80

Read & Write

Read only

Tuesday, May 26, 2009

Page 81: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Replication

• Django doesn’t make it easy to use multiple database connections, but it is possible

• Some caveats

• Slave lag interacts with caching in weird ways

• You can only save to your primary DB (the one you configure in settings.py)

• Unless you get really clever...

81

Tuesday, May 26, 2009

Page 82: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor

Replication

82

1. Create a custom database wrapper by subclassing DatabaseWrapper

Tuesday, May 26, 2009

Page 83: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn

Replication

83

2. Custom QuerySet that uses primary DB for writes

Tuesday, May 26, 2009

Page 84: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query())

def create_query(self): return db.models.sql.Query(self.model, connection)

Replication

84

3. Custom Manager that uses your custom QuerySet

Tuesday, May 26, 2009

Page 85: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Replication

85

http://github.com/mmalone/django-multidb/

Example on github:

Tuesday, May 26, 2009

Page 86: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Replication

• Goals:

• Read-what-you-write consistency for writer

• Eventual consistency for everyone else

• Slave lag screws things up

86

Tuesday, May 26, 2009

Page 87: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Replication

87

What happens when you become write saturated?

Tuesday, May 26, 2009

Page 88: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Federation

Tuesday, May 26, 2009

Page 89: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Federation

89

• Start with Vertical Partitioning: split tables that aren’t joined across database servers

• Actually pretty easy

• Except not with Django

Tuesday, May 26, 2009

Page 90: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Federation

90

django.db.models.base

FAIL!

Tuesday, May 26, 2009

Page 91: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Federation

• At some point you’ll need to split a single table across databases (e.g., user table)

• Now auto-increment won’t work

• But Django uses auto-increment for PKs

• So specify your own PKs in the save() method

• Not a bad idea to start with UUIDs from day one since it’s a pain in the ass to migrate

91

Tuesday, May 26, 2009

Page 92: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class Model(models.Model): def save(self, force_insert=False, force_update=False): if not self.id: force_insert = True self.id = uuid.uuid() return super(Model, self).save(force_insert, force_update) class Meta: abstract = True

Federation

92

Tuesday, May 26, 2009

Page 93: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

UUID Generator

93

http://gist.github.com/117292

Tuesday, May 26, 2009

Page 94: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Profiling, Monitoring & Measuring

Tuesday, May 26, 2009

Page 95: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

>>> Article.objects.filter(pk=3).query.as_sql()('SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article" WHERE "app_article"."id" = %s ', (3,))

Know your SQL

95

Tuesday, May 26, 2009

Page 96: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

>>> import sqlparse>>> def pp_query(qs):... t = qs.query.as_sql()... sql = t[0] % t[1]... print sqlparse.format(sql, reindent=True, keyword_case='upper')... >>> pp_query(Article.objects.filter(pk=3))SELECT "app_article"."id", "app_article"."name", "app_article"."author_id"FROM "app_article"WHERE "app_article"."id" = 3

Know your SQL

96

Tuesday, May 26, 2009

Page 97: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

>>> from django.db import connection>>> connection.queries[{'time': '0.001', 'sql': u'SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article"'}]

Know your SQL

97

Tuesday, May 26, 2009

Page 98: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Know your SQL

• It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__

• Stick the result in connection.queries

• Now we know where the query originated

98

Tuesday, May 26, 2009

Page 99: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

99

Django Debug Toolbar

http://github.com/robhudson/django-debug-toolbar/

Tuesday, May 26, 2009

Page 100: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

• Ganglia (http://ganglia.info)

• Munin (http://munin.projects.linpro.no/)

• Cacti (http://cacti.net)

100

You can’t improve what you don’t measure.

Tuesday, May 26, 2009

Page 101: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

• All Servers

• CPU Usage

• Disk utilization

• IO Wait

• Memory Usage

• Bandwidth Usage

101

Tuesday, May 26, 2009

Page 102: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

• Database Servers

• Queries per second

• Open connections

• Slave lag

• Cache hit rate

102

Tuesday, May 26, 2009

Page 103: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

• Web Servers

• Requests per second

• Response time

• Apache children (or equivalent)

103

Tuesday, May 26, 2009

Page 104: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

• Cache servers

• Requests per second

• Eviction rate

• LRU reference age

• Average object size

• Cache hit ratio

104

Tuesday, May 26, 2009

Page 105: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Monitoring & Measuring

• Application level

• Queue lengths

• Registration rate

• Anything interesting

• You should be able to correlate your server level metrics (like DB QPS) with application level metrics (like API traffic)

105

Tuesday, May 26, 2009

Page 106: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

All done... Questions?

Mike [email protected]

twitter.com/mjmalone

Tuesday, May 26, 2009

Page 107: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Django Patterns & Best Practices

Tuesday, May 26, 2009

Page 108: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

URLs

108

Always name URLs.

Tuesday, May 26, 2009

Page 109: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

url(r’^user/(\w+/$’, ..., name=‘user’))

{% url user username %}

from django.core.urlresolvers import reversereverse(‘user’, kwargs={‘username’: username})

URLs

109

Always name URLs.

Tuesday, May 26, 2009

Page 110: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

The ORM

110

Use Model.objects.get_or_create()

Tuesday, May 26, 2009

Page 111: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

token, created = Token.objects.get_or_create( key=access_token.key, defaults={'secret': access_token.secret})if not created: token.secret = access_token.secret token.save()

The ORM

111

Use Model.objects.get_or_create()

Tuesday, May 26, 2009

Page 112: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Managers

112

Custom Managers are awesome. You should use them.

Tuesday, May 26, 2009

Page 113: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Managers

113

• Custom managers are good for

• Caching

• Denormalization

• Custom SQL

• Complex relationships

• Anything on a model that you want to hide behind a pretty API

Tuesday, May 26, 2009

Page 114: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

class FollowingDescriptor(object): def __get__(self, instance, cls): class RelationshipManager(models.Manager): def get_query_set(self): return User.objects.filter(follower_relationships__user=instance) def add(self, user): instance.following_relationships.create(to_user=user) def remove(self, user): try: relationship = instance.following_relationships.get(to_user=user) relationship.delete() except ObjectDoesNotExist: pass return RelationshipManager()

Managers

114

Tuesday, May 26, 2009

Page 115: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Class-based Views

115

Django views are callables that take a request object and return a response object.

Tuesday, May 26, 2009

Page 116: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Class-based Views

116

• Just implement the __call__() method

• Views instantiated when urls.py is imported

• View instances are global variables

• Not thread-safe

• Retain state between requests

Tuesday, May 26, 2009

Page 117: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Class-based Views

• Make your view subclass HttpResponse

• Kind of hacky, but it works

• Instantiated per request

• Thread safe

• Safe to maintain state in the view instances

• Jacob promises to fix the problems with the __call__()-based approach

117

Tuesday, May 26, 2009

Page 118: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Class-based Views

118

__call__()-based approach

http://www.djangosnippets.org/snippets/1071/

Tuesday, May 26, 2009

Page 119: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Class-based Views

119

http://www.djangosnippets.org/snippets/1072/

Subclass approach

http://gist.github.com/118277

Tuesday, May 26, 2009

Page 120: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Subclassy Models

120

• Abstract models added in Django 1.0

• Useful for creating a common base class

• Pownce: Note superclass would have been nice

• Work by creating multiple tables for superclass and subclasses

• But if you fetch an object via the superclass manager, you get an instance of the superclass... lame.

Tuesday, May 26, 2009

Page 121: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Subclassy Models

121

http://www.djangosnippets.org/snippets/1034/

Use a custom Manager & QuerySet to return an instance of the base class

Tuesday, May 26, 2009

Page 122: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

All done... Questions?

Mike [email protected]

twitter.com/mjmalone

Tuesday, May 26, 2009

Page 123: Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

san francisco meetup

Contact Me

123

Mike [email protected]

twitter.com/mjmalone

Tuesday, May 26, 2009