Building Lanyrd

65
Building Lanyrd Lanyrd.com Simon Willison BrightonPy, 9th August 2011 http://lanyrd.com/sgptt
  • date post

    18-Oct-2014
  • Category

    Technology

  • view

    12.333
  • download

    2

description

How we built and scaled Lanyrd, using Python, Django, Solr, r

Transcript of Building Lanyrd

Page 1: Building Lanyrd

Building LanyrdLanyrd.com

Simon WillisonBrightonPy, 9th August 2011

http://lanyrd.com/sgptt

Page 2: Building Lanyrd

Lanyrd.com

Definitive databaseof professional events

and speakers

Page 3: Building Lanyrd

Lanyrd.com

Social event recommendationComprehensive speaker profiles

Archive of slides, notes and video

Definitive databaseof professional events

and speakers

Page 4: Building Lanyrd

A brief history

Page 5: Building Lanyrd

Casablanca!August 2010

Page 6: Building Lanyrd

• Aug 31st, 11:22: Launch! (1 linode)

• Aug 31st, 12:41: Unlaunch

• Aug 31st, 12:54: Read only mode

• Aug 31st, 14:15: DB server (2 linodes)

• Sep 1st: Limit 50 on dashboard

• Sep 1st: disable-dashboard setting

Page 7: Building Lanyrd

• Sep 3rd: dConstruct (and Twitter bot)

• Sep 4th: TechCrunched (read only :( )

• Sep 5th: 3 large EC2 + 1 RDS

• Sep 6th: Downgrade to 3 small EC2

Page 8: Building Lanyrd

December photo: @niqui

Page 9: Building Lanyrd

• Dec 8: Calacanis + Scoble at the same time!

• Upgrade to next size of RDS

• (Sometimes scaling vertically does the job)

Page 10: Building Lanyrd

• Jan 26th: Solr powered dashboard

• Replicated to 2, then 3 servers

Page 11: Building Lanyrd

Load balancer (nginx) HTTP cache (varnish)

lanyrd.com badges.lanyrd.net

app server(django/mod_wsgi)

app server(django/mod_wsgi)

app server(django/mod_wsgi)

search master(solr)

search slave(solr)

search slave(solr)

Database(MySQL RDS)

Redis(data structures + message queue)

worker(celery)

worker(celery)

logging(MongoDB)

Page 12: Building Lanyrd

Solr + Haystack

Page 13: Building Lanyrd

Main Wiki

apache > lucene > solr

Search the site with Solr Search

Powered by Lucid ImaginationLast Published: Sat, 04 Jun 2011 12:23:42 GMT

Welcome to Solr

What Is Solr?Get StartedNews

May 2011 - Solr 3.2 ReleasedMarch 2011 - Solr 3.1 Released25 June 2010 - Solr 1.4.1 Released7 May 2010 - Apache Lucene Eurocon 2010 Coming to Prague May 18-2110 November 2009 - Solr 1.4 Released20 August 2009 - Solr's first book is published!18 August 2009 - Lucene at US ApacheCon09 February 2009 - Lucene at ApacheCon Europe 2009 in Amsterdam19 December 2008 - Solr Logo Contest Results03 October 2008 - Solr Logo Contest15 September 2008 - Solr 1.3.0 Available28 August 2008 - Lucene/Solr at ApacheCon New Orleans03 September 2007 - Lucene at ApacheCon Atlanta06 June 2007: Release 1.2 available17 January 2007: Solr graduates from Incubator22 December 2006: Release 1.1.0 available15 August 2006: Solr at ApacheCon US21 April 2006: Solr at ApacheCon21 February 2006: nightly builds17 January 2006: Solr Joins Apache Incubator

What Is Solr?

PDF

About

WelcomeWho We Are

Documentation

Resources

Related Projects

Page 14: Building Lanyrd

Find the needle you're looking for. Download Documentation

Search doesn't have to be hard. Haystack lets you write your search code

once and choose the search engine you want it to run on. With a familiar API

that should make any Djangonaut feel right at home and an architecture that

allows you to swap things in and out as you need to, it's how search ought

to be.

Haystack is BSD licensed , plays nicely with third-party app without needing

to modify the source and supports Solr , Whoosh and Xapian .

Get started

1. Get the most recent source.2. Add haystack to your INSTALLED_APPS.3. Create search_indexes.py files for your models.4. Setup the main SearchIndex via autodiscover.5. Include haystack.urls to your URLconf.6. Search!

Sprinting to 1.1-finalPosted on 2010/11/16 by Daniel

Though this site has sat out ofdate, there has been a lot ofwork put into Haystack 1.1. Asof writing, there are eight issuesblocking the release. I aim tohave those down to zero by theend of the week.

Once those eight are done, I willbe releasing 1.1-final. The RCprocess really didn't do muchlast time and this release hasbeen a long time in coming. Thisrelease will feature:

Vastly improved facetingWhoosh 1.X support!Document & field boostsupport

More Like This

Faceting

Stored (non-indexed) fields

Highlighting

Spelling Suggestions

Boost

Page 15: Building Lanyrd

Model-oriented search

• Define search_indexes.py (like admin.py) for your application

• Hook up default haystack search views

• Write a quick search.html template

• Run ./manage.py rebuild_index

Page 16: Building Lanyrd
Page 17: Building Lanyrd

add a conferenceadd a conference you are signed in as simonw, do you want to sign out?

calendarcalendar conferencesconferences coveragecoverage profileprofile

searchsearch

EVENT

TIME

SPEAKERS

EVENT

TIME

SPEAKERS

EVENT

TIME

SPEAKERS

Your current filters are… TYPE: Sessions TOPIC: NoSQL PLACE: United States Clear all filters

NoSQL and Django PanelDjangoCon US 2010

9th September 2010 09:00-10:00

Jacob Burch

Step Away From That DatabaseDjangoCon US 2010

8th September 2010 11:20-12:00

Andrew Godwin

Apache Cassandra in ActionStrata 2011

1st February 2011 13:30-17:00

Jonathan Ellis

FILTER BYtype

FILTER BYtopicNoSQL 3

Django 2

Cassandra 1

FILTER BYplaceUnited States 3

Multnomah 2

Oregon 2

Portland 2

Santa Clara 1

California 1

SearchSearchWe found 3 results for “django”

django SearchSearch

Sessions 3

Page 18: Building Lanyrd

class BookIndex(indexes.SearchIndex): text = indexes.CharField(document=True, use_template=True) speakers = indexes.MultiValueField() topics = indexes.MultiValueField() def prepare_speakers(self, obj): return [a.user.t_id for a in obj.authors.exclude( user = None ).select_related('user')] def prepare_topics(self, obj): return list(obj.topics.values_list('pk', flat=True))

Page 19: Building Lanyrd

search/indexes/books/book_text.txt

{{ object.title }}{{ object.tagline }}{% for author in object.authors.all %} {{ author.display_name }} {{ author.user.t_screen_name }}{% endfor %}{% for topic in object.topics.all %} {{ topic.name_en }}{% endfor %}

Page 20: Building Lanyrd

Staying fresh

• Search engines usually don’t like accepting writes too frequently

• RealTimeSearchIndex for low traffic sites

• ./manage.py update_index --age=6 (hours)

• Uses index.get_updated_field()

• Roll your own (message queue or similar...)

Page 21: Building Lanyrd

Replication

Solr Master

Solr Slave Solr SlaveSolr Slave

Page 22: Building Lanyrd

Smarter indexing

class Article(models.Model): needs_indexing = models.BooleanField( default = True, db_index = True ) ... def save(self, *args, **kwargs): self.needs_indexing = True super(Article, self).save(*args, **kwargs)

Page 23: Building Lanyrd

index = site.get_index(model)updated_pks = []

objects = index.load_all_queryset().filter( needs_indexing=True)[:100]if not objects: return

for object in objects: updated_pks.append(object.pk) index.update_object(object)

index.load_all_queryset().filter( pk__in = updated_pks).update(needs_indexing = False)

Page 24: Building Lanyrd

nginx + Solr replication trick

upstream solrmaster { server 10.68.43.214:8080;}upstream solrslaves { server 10.68.43.214:8080; server 10.193.138.80:8080; server 10.204.143.106:8080;}

server { listen 8983; location /solr/update { proxy_pass http://solrmaster; } location /solr/select { proxy_pass http://solrslaves; }}

Page 25: Building Lanyrd

add a conferenceadd a conference you are signed in as simonw, do you want to sign out?

calendarcalendar conferencesconferences coveragecoverage profileprofile

searchsearch

TODAY

We've found 182 conferences your Twitter contacts are

interested in.

From our blogWelcoming SophieBarrett to teamLanyrd

Today we have a very special

announcement (and for once,

it's not a new feature!) We

would like to welcome the

super-wonderful Sophie Barrett

to the Lanyrd team.

Session schedules inyour calendar

You can now subscribe to event

schedules in your calendar of

choice. Stay up to date at the

event with the schedule in the

pocket where you need it.

Venues (and venuemaps)

Your contacts' calendarYour contacts' calendaryours 24 contacts 182

Astronomy Science

Café Scientifique: Exploringthe dark side of starformation with the HerschelSpace Observatory

United Kingdom / Brighton

21st June 2011

4 contacts tracking

21 Attend

Track

Usability User Experience

Usability Professionals'Association – InternationalConference

United States / Atlanta

21st–24th June 2011

1 contact speaking and 3 contacts tracking

21 Attend

Track

Simon

Willison

Your profile

page

Page 26: Building Lanyrd

# Original implementationtwitter_ids = [11134, 223455, 33221, ...] # fetch from Twitter

attendees = Attendee.objects.filter( user__t_id__in = twitter_ids).filter( conference__start_date__gte = datetime.date.today())

Page 27: Building Lanyrd

# Current implementationtwitter_ids = [11134, 223455, 33221, ...] # fetch from Twitter

sqs = SearchQuerySet()sqs = sqs.models(Conference)or_string = ' OR '.join(twitter_ids)sqs = sqs.narrow('attendees:(%s)' % or_string)

Page 28: Building Lanyrd

Redis

Page 29: Building Lanyrd

Try it

Ready for a test drive? Check this interactiveinteractive

tutorialtutorial that will walk you through the most

important features of Redis.

Redis is an open source, advanced key-value store. It is often

referred to as a data structure server since keys can contain

stringsstrings, hasheshashes, listslists, setssets and sorted sorted setssets.

Learn more Learn more →→

Download it

Redis 2.2.10 is the latest stable version.Redis 2.2.10 is the latest stable version.

Interested in legacy or unstable versions?

Check the downloads page.Check the downloads page.

What people are saying

More...More...

Comparison of CouchDB, Redis,MongoDB, Casandra, Neo4J &others http://j.mp/l32SqMhttp://j.mp/l32SqM via@DZone

@__NeverGiveup Oh YAY, oui tume redis ! *-* Hm, on s'rejoint à14h au bahut ? :o

JE L REDIS JE FOLLOW BACKSUR @Fuckement_TL

une question : "How to useServiceStack Redis in a webapplication to take advantage ofpub / sub paradigm"http://t.co/EOgyLU1http://t.co/EOgyLU1 #redis #web

Nice - Cassandra vs MongoDB vsCouchDB vs Redis vs Riak vsHBase vs Membase vs Neo4jcomparison http://bit.ly/l32SqMhttp://bit.ly/l32SqMfrom @kkovacs

This website is open source software developed by Citrusbyte. The Redis logo was designed by Carlos Prioglio.

Sponsored by

Commands Clients Documentation Community Download Issues

Page 30: Building Lanyrd

simonw-follows:{144,21345,12328...}europython-attendees:{344,21345,787...}

contact_ids = redis.sinter( 'simonw-follows', 'europython-attendees')

Page 31: Building Lanyrd

Lanyrd.comadd a conferenceadd a conference you are signed in as simonw, do you want to sign out?

calendarcalendar conferencesconferences coveragecoverage profileprofile

searchsearch

JUNE2011

Florencein Italy

EuroPython 2011EuroPython 2011The European Python Conference

You'respeakingAT THIS EVENT

(short URL)

119 speakers

9780

PEOPLE

attending

PEOPLE

tracking

TELL YOUR FRIENDS!

Tweet about thisevent

Topics

Django

Plone

Pyramid

Python

Twisted

19–26http://ep2011.europython.eu/

View the schedule on Lanyrd

Save to iCal / iPhone / Outlook /GCal

@europython

#europython

lanyrd.com/ccdpc

AndreasSchreiber@onyame

AndrewGodwin@andrewgodwin

AndriiMishkovskyi@mishok13

ArminRonacher

AlanFranzoni@franzeur

AlessandroDentella

Alex Martelli

Ali Afshar@aliafshar

AnnaRavenscroft

Anselm Kruis

Antonio Cuni@antocuni

Armin RigoEdit topics

Page 32: Building Lanyrd

Celery

Page 33: Building Lanyrd

Distributed Task QueueCelery is an asynchronous task queue/job queue based on distributedmessage passing. It is focused on real-time operation, but supportsscheduling as well.

The execution units, called tasks, are executed concurrently on a singleor more worker servers using multiprocessing, Eventlet, or gevent.Tasks can execute asynchronously (in the background) orsynchronously (wait until ready).

Celery is used in production systems to process millions of tasks a day.

Celery is written in Python, but the protocol can be implemented inany language. It can also operate with other languages usingwebhooks.

The recommended message broker is RabbitMQ, but limited supportfor Redis, Beanstalk, MongoDB, CouchDB, and databases (usingSQLAlchemy or the Django ORM) is also available.

Celery is easy to integrate with Django, Pylons and Flask, using thedjango-celery, celery-pylons and Flask-Celery add-on packages.

Example

This is a simple task adding two numbers:

Celery 2.2 released!By @asksol on 2011-02-01.

A great number of new features,including Jython, eventlet and geventsupport. Everything is detailed in theChangelog, which you should have readbefore upgrading.

Users of Django must also upgrade todjango-celery 2.2.

This release would not have beenpossible without the help ofcontributors and users, so thank you,and congratulations!

Celery 2.1.1 bugfixreleaseBy @asksol on 2010-10-14.

All users are urged to upgrade. For a listof changes see the Changelog.

Users of Django must also upgrade todjango-celery 2.1.1.

Celery 2.1 released!

Background Processing

Background Processing

Distributed

Distributed

Asynchronous/Synchronous

Asynchronous/Synchronous

Concurrency

Concurrency

Periodic Tasks

Periodic Tasks

Retries

Retries

Home CodeDocumentationCommunityDownload

Page 34: Building Lanyrd

Tasks?

• Anything that takes more than about 200ms

• Updating a search index

• Resizing images

• Hitting external APIs

• Generating reports

Page 35: Building Lanyrd

Trivial example• Fetch the content of a web page

from celery.task import task

@taskdef fetch_url(url): return urllib.urlopen(url).read()

>>> result = fetch_url.delay(‘http://cnn.com/’)>>> html = result.wait()

Page 36: Building Lanyrd

add a conferenceadd a conference you are signed in as simonw, do you want to sign out?

calendarcalendar conferencesconferences coveragecoverage profileprofile

searchsearch

Python and MongoDBPython and MongoDBtutorialtutorialA session at EuroPython 2011

MongoDB is the new star of the so-called NoSQL databases. UsingPython with MongoDB is the next logical step after having usedPython for years with relational databases.

This talk will give an introduction into MongoDB and demonstratehow MongoDB can be be used from Python.

More information can be found under:

http://www.zopyx.com/resources/python-mongodb-tutorial-at...

More sessions at EuroPython 2011 on Python

Add coverage to this session

A URL to coverage such as videos, slides, podcasts, handouts, sketchnotes, photosetc.

AddAdd

Attendees

EuroPython 2011

Italy / Florence

19th–26th June 2011

TELL YOUR FRIENDS!Tweet about thissession

WHENTime 14:30–18:30 CET

Date 20th June 2011

SESSION HASH TAG#sftzh

SHORT URLlanyrd.com/sftzh

OFFICIAL SESSIONPAGEep2011.europython.eu/conf

TopicsMongoDB

Python

SCHEDULEINCOMPLETE?Add another session

Tools

Merge PK: 15349

Delete

SEE SOMETHINGWRONG?Report an issue with thissession

Andreas

JungCEO, ZOPYX Ltd

View the schedule

Edit topics

Edit details

Edit speakers

faq ! blog ! privacy ! services ! colophonFollow @lanyrd on twitter. add a conferenceadd a conference

http://www.slideshare.net/ajung/python-mo

Page 37: Building Lanyrd

add a conferenceadd a conference you are signed in as simonw, do you want to sign out?

calendarcalendar conferencesconferences coveragecoverage profileprofile

searchsearch

Link

Write-up

Slides

Video

Audio

Sketch notes

Transcript

Handout

Liveblog

Photos

Notes

Link titlePython mongo db-training-europython-2011

Type of coverage

Coverage previewFrom SlideShare:

Display this preview on the site

Uncheck this if the preview appears broken in any way

Add this coverageAdd this coverage

EuroPython 2011

Italy / Florence

19th–26th June 2011

Add coverageAdd coveragehttp://www.slideshare.net/ajung/python-mongo-dbtrainingeurop...

Python and

MongoDB tutorial

Debug

faq ! blog ! privacy ! services ! colophonFollow @lanyrd on twitter. add a conferenceadd a conference

Page 38: Building Lanyrd

The task itself...

• Tries using http://embed.ly/ to find a preview

• Fetches the HTTP headers and first 2048 bytes

• If HTML, attempts to extract the <title>

• If other, gets the file type and size from headers

Page 39: Building Lanyrd

Behind the scenes...

ar = enhance_link.delay(url)poll_url = '/working/%s/' % signed.dumps({ 'task_id': ar.task_id, 'on_done_url': on_done_url,})if 'ajax' in request.POST: return render_json(request, { 'ok': True, 'poll_url': poll_url, })else: return HttpResponseRedirect(poll_url)

Page 40: Building Lanyrd

And when it’s done...

from celery.backends import default_backend

...task_id = request.REQUEST.get('id', '')result = default_backend.get_result(task_id)

Page 41: Building Lanyrd

Configuration

# Carrot / Celery: queue uses RedisCARROT_BACKEND = "ghettoq.taproot.Redis"BROKER_HOST = " 10.11.11.11" # redis serverBROKER_PORT = 6379BROKER_VHOST = "6"

# Task results stored in memcached, so they can # expire automaticallyCELERY_RESULT_BACKEND = "cache"CELERY_CACHE_BACKEND = \ "memcached://10.11.11.12:11211;..."

Page 42: Building Lanyrd

Tricks

Page 43: Building Lanyrd

Phantom load testing

• Deploy a new architecture on a brand new EC2 cluster

• Leave your existing site on the old cluster

• Invisibly link to the new stack from an <img width=1 height=1> element on your live site (not for very long though)

• (sensible alternative: find a way to replay log files)

Page 44: Building Lanyrd

cache_version

Page 45: Building Lanyrd

add a conferenceadd a conference you are signed in as simonw, do you want to sign out?

calendarcalendar conferencesconferences coveragecoverage profileprofile

searchsearch

ON NOW

Django Plone Pyramid Python Twisted

EuroPython 2011Italy / Florence

19th–26th June 2011

SEPTEMBER

2011Django Open Source Python

Django Python

DjangoCon US 2011United States / Portland

6th–8th September 2011

PyCON FR 2011France / Rennes

17th–18th September 2011

OCTOBER PyCon DE 2011

Django events looking for participants1 Django event is looking for participants

Djangocoverage

By countryIreland 1

Django conferencesDjango conferences

19

6

17

4

52 videosMost recent added 3weeks ago

52 slide decksMost recent added 4hours ago

3 audio clipsMost recent added 1week ago

27 write-upsMost recent added 1week ago

11 handoutsMost recent added 18hours ago

3 notesMost recent added 10hours ago

Page 46: Building Lanyrd

class Conference(models.Model): ... cache_version = models.IntegerField(default = 0) def save(self, *args, **kwargs): self.cache_version += 1 super(Conference, self).save(*args, **kwargs)

def touch(self): Conference.objects.filter(pk = self.pk).update( cache_version = F('cache_version') + 1 )

Page 47: Building Lanyrd

{% cache 36000 conf-topics conference.pk conference.cache_version %} <ul class="tags inline-tags meta"> {% for topic in conference.topics.all %} <li><a href="{{ topic.get_absolute_url }}">{{ topic }}</a></li> {% endfor %} </ul>{% endcache %}

Page 48: Building Lanyrd

from django.models import F

topic.conferences.all().update( cache_version = F('cache_version') + 1)

Bulk invalidation

Page 49: Building Lanyrd

Signing

Page 50: Building Lanyrd

Pass data through an untrusted source with confidence that it

hasn't been tampered with

Page 51: Building Lanyrd

Signing uses

• "Unsubscribe" links in emails

• lanyrd.com/un/ImN6VyI.ii0Hwm7p71DEcGfaVzziQaxeuu

?redirect_to=URL protection

Signed cookies

"You are logged in as simonw" without hitting the database

Page 52: Building Lanyrd

Signing in Django 1.4

from django.core import signing

signing.dumps({"foo": "bar"})

signing.loads(signed_string)

response.set_signed_cookie(key, value...)

response.get_signed_cookie(key)

Page 53: Building Lanyrd

Hashed static asset filenames in S3/CloudFront

Page 54: Building Lanyrd

global.js

global.ed81d119.js

cdn.lanyrd.net/js/global.ed81d119.js

Page 55: Building Lanyrd

Benefits

• Far futures expiry headers

• Cache-Control: max-age=315360000

• Expires: Fri, 18 Jun 2021 06:45:00 -0000 GMT

• Guaranteed updated CSS in IE

• Deploy new assets in advance of application

• Old versions stick around for rollbacks

Page 56: Building Lanyrd

./manage.py push_static

• Minifies JavaScript and CSS

• Renames files to include sha1(contents)[:6]

• Pushes all assets to S3

Page 57: Building Lanyrd

Profiling and debugging production systems

Page 58: Building Lanyrd

UserBasedExceptionMiddleware

from django.views.debug import technical_500_responseimport sys

class UserBasedExceptionMiddleware(object): def process_exception(self, request, exception): if request.user.is_superuser: return technical_500_response(request, *sys.exc_info())

Page 59: Building Lanyrd

mysql-proxy

• Very handy lua-customisable proxy for all of your MySQL traffic

• Worst documented software ever

• log.lua - logs out ALL queries

• https://gist.github.com/1039751

Page 60: Building Lanyrd

django_instrumented

• (Unreleased) code I wrote for Lanyrd

• Collects various runtime stats about the current request, stashes a profile JSON in memcached

• Writes out the profile UUID as part of the HTML

• A bookmarklet to view the profile

Page 61: Building Lanyrd
Page 62: Building Lanyrd

mongodb logging

• Super-fast inserts, log everything!

• Capped collections

• Structured queries

• Ask me about it in a few months

Page 63: Building Lanyrd

For the future...

• Much better profiling, monitoring and alerts

• Varnish in front of everything

• Replicated MySQL for analytics + upgrades

Page 64: Building Lanyrd

Questions?

Page 65: Building Lanyrd

Thank you!http://lanyrd.com/sgptt