Building Lanyrd
-
date post
18-Oct-2014 -
Category
Technology
-
view
12.333 -
download
2
description
Transcript of Building Lanyrd
Building LanyrdLanyrd.com
Simon WillisonBrightonPy, 9th August 2011
http://lanyrd.com/sgptt
Lanyrd.com
Definitive databaseof professional events
and speakers
Lanyrd.com
Social event recommendationComprehensive speaker profiles
Archive of slides, notes and video
Definitive databaseof professional events
and speakers
A brief history
Casablanca!August 2010
• Aug 31st, 11:22: Launch! (1 linode)
• Aug 31st, 12:41: Unlaunch
• Aug 31st, 12:54: Read only mode
• Aug 31st, 14:15: DB server (2 linodes)
• Sep 1st: Limit 50 on dashboard
• Sep 1st: disable-dashboard setting
• Sep 3rd: dConstruct (and Twitter bot)
• Sep 4th: TechCrunched (read only :( )
• Sep 5th: 3 large EC2 + 1 RDS
• Sep 6th: Downgrade to 3 small EC2
December photo: @niqui
• Dec 8: Calacanis + Scoble at the same time!
• Upgrade to next size of RDS
• (Sometimes scaling vertically does the job)
• Jan 26th: Solr powered dashboard
• Replicated to 2, then 3 servers
Load balancer (nginx) HTTP cache (varnish)
lanyrd.com badges.lanyrd.net
app server(django/mod_wsgi)
app server(django/mod_wsgi)
app server(django/mod_wsgi)
search master(solr)
search slave(solr)
search slave(solr)
Database(MySQL RDS)
Redis(data structures + message queue)
worker(celery)
worker(celery)
logging(MongoDB)
Solr + Haystack
Main Wiki
apache > lucene > solr
Search the site with Solr Search
Powered by Lucid ImaginationLast Published: Sat, 04 Jun 2011 12:23:42 GMT
Welcome to Solr
What Is Solr?Get StartedNews
May 2011 - Solr 3.2 ReleasedMarch 2011 - Solr 3.1 Released25 June 2010 - Solr 1.4.1 Released7 May 2010 - Apache Lucene Eurocon 2010 Coming to Prague May 18-2110 November 2009 - Solr 1.4 Released20 August 2009 - Solr's first book is published!18 August 2009 - Lucene at US ApacheCon09 February 2009 - Lucene at ApacheCon Europe 2009 in Amsterdam19 December 2008 - Solr Logo Contest Results03 October 2008 - Solr Logo Contest15 September 2008 - Solr 1.3.0 Available28 August 2008 - Lucene/Solr at ApacheCon New Orleans03 September 2007 - Lucene at ApacheCon Atlanta06 June 2007: Release 1.2 available17 January 2007: Solr graduates from Incubator22 December 2006: Release 1.1.0 available15 August 2006: Solr at ApacheCon US21 April 2006: Solr at ApacheCon21 February 2006: nightly builds17 January 2006: Solr Joins Apache Incubator
What Is Solr?
About
WelcomeWho We Are
Documentation
Resources
Related Projects
Find the needle you're looking for. Download Documentation
Search doesn't have to be hard. Haystack lets you write your search code
once and choose the search engine you want it to run on. With a familiar API
that should make any Djangonaut feel right at home and an architecture that
allows you to swap things in and out as you need to, it's how search ought
to be.
Haystack is BSD licensed , plays nicely with third-party app without needing
to modify the source and supports Solr , Whoosh and Xapian .
Get started
1. Get the most recent source.2. Add haystack to your INSTALLED_APPS.3. Create search_indexes.py files for your models.4. Setup the main SearchIndex via autodiscover.5. Include haystack.urls to your URLconf.6. Search!
Sprinting to 1.1-finalPosted on 2010/11/16 by Daniel
Though this site has sat out ofdate, there has been a lot ofwork put into Haystack 1.1. Asof writing, there are eight issuesblocking the release. I aim tohave those down to zero by theend of the week.
Once those eight are done, I willbe releasing 1.1-final. The RCprocess really didn't do muchlast time and this release hasbeen a long time in coming. Thisrelease will feature:
Vastly improved facetingWhoosh 1.X support!Document & field boostsupport
More Like This
Faceting
Stored (non-indexed) fields
Highlighting
Spelling Suggestions
Boost
Model-oriented search
• Define search_indexes.py (like admin.py) for your application
• Hook up default haystack search views
• Write a quick search.html template
• Run ./manage.py rebuild_index
add a conferenceadd a conference you are signed in as simonw, do you want to sign out?
calendarcalendar conferencesconferences coveragecoverage profileprofile
searchsearch
EVENT
TIME
SPEAKERS
EVENT
TIME
SPEAKERS
EVENT
TIME
SPEAKERS
Your current filters are… TYPE: Sessions TOPIC: NoSQL PLACE: United States Clear all filters
NoSQL and Django PanelDjangoCon US 2010
9th September 2010 09:00-10:00
Jacob Burch
Step Away From That DatabaseDjangoCon US 2010
8th September 2010 11:20-12:00
Andrew Godwin
Apache Cassandra in ActionStrata 2011
1st February 2011 13:30-17:00
Jonathan Ellis
FILTER BYtype
FILTER BYtopicNoSQL 3
Django 2
Cassandra 1
FILTER BYplaceUnited States 3
Multnomah 2
Oregon 2
Portland 2
Santa Clara 1
California 1
SearchSearchWe found 3 results for “django”
django SearchSearch
Sessions 3
class BookIndex(indexes.SearchIndex): text = indexes.CharField(document=True, use_template=True) speakers = indexes.MultiValueField() topics = indexes.MultiValueField() def prepare_speakers(self, obj): return [a.user.t_id for a in obj.authors.exclude( user = None ).select_related('user')] def prepare_topics(self, obj): return list(obj.topics.values_list('pk', flat=True))
search/indexes/books/book_text.txt
{{ object.title }}{{ object.tagline }}{% for author in object.authors.all %} {{ author.display_name }} {{ author.user.t_screen_name }}{% endfor %}{% for topic in object.topics.all %} {{ topic.name_en }}{% endfor %}
Staying fresh
• Search engines usually don’t like accepting writes too frequently
• RealTimeSearchIndex for low traffic sites
• ./manage.py update_index --age=6 (hours)
• Uses index.get_updated_field()
• Roll your own (message queue or similar...)
Replication
Solr Master
Solr Slave Solr SlaveSolr Slave
Smarter indexing
class Article(models.Model): needs_indexing = models.BooleanField( default = True, db_index = True ) ... def save(self, *args, **kwargs): self.needs_indexing = True super(Article, self).save(*args, **kwargs)
index = site.get_index(model)updated_pks = []
objects = index.load_all_queryset().filter( needs_indexing=True)[:100]if not objects: return
for object in objects: updated_pks.append(object.pk) index.update_object(object)
index.load_all_queryset().filter( pk__in = updated_pks).update(needs_indexing = False)
nginx + Solr replication trick
upstream solrmaster { server 10.68.43.214:8080;}upstream solrslaves { server 10.68.43.214:8080; server 10.193.138.80:8080; server 10.204.143.106:8080;}
server { listen 8983; location /solr/update { proxy_pass http://solrmaster; } location /solr/select { proxy_pass http://solrslaves; }}
add a conferenceadd a conference you are signed in as simonw, do you want to sign out?
calendarcalendar conferencesconferences coveragecoverage profileprofile
searchsearch
TODAY
We've found 182 conferences your Twitter contacts are
interested in.
From our blogWelcoming SophieBarrett to teamLanyrd
Today we have a very special
announcement (and for once,
it's not a new feature!) We
would like to welcome the
super-wonderful Sophie Barrett
to the Lanyrd team.
Session schedules inyour calendar
You can now subscribe to event
schedules in your calendar of
choice. Stay up to date at the
event with the schedule in the
pocket where you need it.
Venues (and venuemaps)
Your contacts' calendarYour contacts' calendaryours 24 contacts 182
Astronomy Science
Café Scientifique: Exploringthe dark side of starformation with the HerschelSpace Observatory
United Kingdom / Brighton
21st June 2011
4 contacts tracking
21 Attend
Track
Usability User Experience
Usability Professionals'Association – InternationalConference
United States / Atlanta
21st–24th June 2011
1 contact speaking and 3 contacts tracking
21 Attend
Track
Simon
Willison
Your profile
page
# Original implementationtwitter_ids = [11134, 223455, 33221, ...] # fetch from Twitter
attendees = Attendee.objects.filter( user__t_id__in = twitter_ids).filter( conference__start_date__gte = datetime.date.today())
# Current implementationtwitter_ids = [11134, 223455, 33221, ...] # fetch from Twitter
sqs = SearchQuerySet()sqs = sqs.models(Conference)or_string = ' OR '.join(twitter_ids)sqs = sqs.narrow('attendees:(%s)' % or_string)
Redis
Try it
Ready for a test drive? Check this interactiveinteractive
tutorialtutorial that will walk you through the most
important features of Redis.
Redis is an open source, advanced key-value store. It is often
referred to as a data structure server since keys can contain
stringsstrings, hasheshashes, listslists, setssets and sorted sorted setssets.
Learn more Learn more →→
Download it
Redis 2.2.10 is the latest stable version.Redis 2.2.10 is the latest stable version.
Interested in legacy or unstable versions?
Check the downloads page.Check the downloads page.
What people are saying
More...More...
Comparison of CouchDB, Redis,MongoDB, Casandra, Neo4J &others http://j.mp/l32SqMhttp://j.mp/l32SqM via@DZone
@__NeverGiveup Oh YAY, oui tume redis ! *-* Hm, on s'rejoint à14h au bahut ? :o
JE L REDIS JE FOLLOW BACKSUR @Fuckement_TL
une question : "How to useServiceStack Redis in a webapplication to take advantage ofpub / sub paradigm"http://t.co/EOgyLU1http://t.co/EOgyLU1 #redis #web
Nice - Cassandra vs MongoDB vsCouchDB vs Redis vs Riak vsHBase vs Membase vs Neo4jcomparison http://bit.ly/l32SqMhttp://bit.ly/l32SqMfrom @kkovacs
This website is open source software developed by Citrusbyte. The Redis logo was designed by Carlos Prioglio.
Sponsored by
Commands Clients Documentation Community Download Issues
simonw-follows:{144,21345,12328...}europython-attendees:{344,21345,787...}
contact_ids = redis.sinter( 'simonw-follows', 'europython-attendees')
Lanyrd.comadd a conferenceadd a conference you are signed in as simonw, do you want to sign out?
calendarcalendar conferencesconferences coveragecoverage profileprofile
searchsearch
JUNE2011
Florencein Italy
EuroPython 2011EuroPython 2011The European Python Conference
You'respeakingAT THIS EVENT
(short URL)
119 speakers
9780
PEOPLE
attending
PEOPLE
tracking
TELL YOUR FRIENDS!
Tweet about thisevent
Topics
Django
Plone
Pyramid
Python
Twisted
19–26http://ep2011.europython.eu/
View the schedule on Lanyrd
Save to iCal / iPhone / Outlook /GCal
@europython
#europython
lanyrd.com/ccdpc
AndreasSchreiber@onyame
AndrewGodwin@andrewgodwin
AndriiMishkovskyi@mishok13
ArminRonacher
AlanFranzoni@franzeur
AlessandroDentella
Alex Martelli
Ali Afshar@aliafshar
AnnaRavenscroft
Anselm Kruis
Antonio Cuni@antocuni
Armin RigoEdit topics
Celery
Distributed Task QueueCelery is an asynchronous task queue/job queue based on distributedmessage passing. It is focused on real-time operation, but supportsscheduling as well.
The execution units, called tasks, are executed concurrently on a singleor more worker servers using multiprocessing, Eventlet, or gevent.Tasks can execute asynchronously (in the background) orsynchronously (wait until ready).
Celery is used in production systems to process millions of tasks a day.
Celery is written in Python, but the protocol can be implemented inany language. It can also operate with other languages usingwebhooks.
The recommended message broker is RabbitMQ, but limited supportfor Redis, Beanstalk, MongoDB, CouchDB, and databases (usingSQLAlchemy or the Django ORM) is also available.
Celery is easy to integrate with Django, Pylons and Flask, using thedjango-celery, celery-pylons and Flask-Celery add-on packages.
Example
This is a simple task adding two numbers:
Celery 2.2 released!By @asksol on 2011-02-01.
A great number of new features,including Jython, eventlet and geventsupport. Everything is detailed in theChangelog, which you should have readbefore upgrading.
Users of Django must also upgrade todjango-celery 2.2.
This release would not have beenpossible without the help ofcontributors and users, so thank you,and congratulations!
Celery 2.1.1 bugfixreleaseBy @asksol on 2010-10-14.
All users are urged to upgrade. For a listof changes see the Changelog.
Users of Django must also upgrade todjango-celery 2.1.1.
Celery 2.1 released!
Background Processing
Background Processing
Distributed
Distributed
Asynchronous/Synchronous
Asynchronous/Synchronous
Concurrency
Concurrency
Periodic Tasks
Periodic Tasks
Retries
Retries
Home CodeDocumentationCommunityDownload
Tasks?
• Anything that takes more than about 200ms
• Updating a search index
• Resizing images
• Hitting external APIs
• Generating reports
Trivial example• Fetch the content of a web page
from celery.task import task
@taskdef fetch_url(url): return urllib.urlopen(url).read()
>>> result = fetch_url.delay(‘http://cnn.com/’)>>> html = result.wait()
add a conferenceadd a conference you are signed in as simonw, do you want to sign out?
calendarcalendar conferencesconferences coveragecoverage profileprofile
searchsearch
Python and MongoDBPython and MongoDBtutorialtutorialA session at EuroPython 2011
MongoDB is the new star of the so-called NoSQL databases. UsingPython with MongoDB is the next logical step after having usedPython for years with relational databases.
This talk will give an introduction into MongoDB and demonstratehow MongoDB can be be used from Python.
More information can be found under:
http://www.zopyx.com/resources/python-mongodb-tutorial-at...
More sessions at EuroPython 2011 on Python
Add coverage to this session
A URL to coverage such as videos, slides, podcasts, handouts, sketchnotes, photosetc.
AddAdd
Attendees
EuroPython 2011
Italy / Florence
19th–26th June 2011
TELL YOUR FRIENDS!Tweet about thissession
WHENTime 14:30–18:30 CET
Date 20th June 2011
SESSION HASH TAG#sftzh
SHORT URLlanyrd.com/sftzh
OFFICIAL SESSIONPAGEep2011.europython.eu/conf
TopicsMongoDB
Python
SCHEDULEINCOMPLETE?Add another session
Tools
Merge PK: 15349
Delete
SEE SOMETHINGWRONG?Report an issue with thissession
Andreas
JungCEO, ZOPYX Ltd
View the schedule
Edit topics
Edit details
Edit speakers
faq ! blog ! privacy ! services ! colophonFollow @lanyrd on twitter. add a conferenceadd a conference
http://www.slideshare.net/ajung/python-mo
add a conferenceadd a conference you are signed in as simonw, do you want to sign out?
calendarcalendar conferencesconferences coveragecoverage profileprofile
searchsearch
Link
Write-up
Slides
Video
Audio
Sketch notes
Transcript
Handout
Liveblog
Photos
Notes
Link titlePython mongo db-training-europython-2011
Type of coverage
Coverage previewFrom SlideShare:
Display this preview on the site
Uncheck this if the preview appears broken in any way
Add this coverageAdd this coverage
EuroPython 2011
Italy / Florence
19th–26th June 2011
Add coverageAdd coveragehttp://www.slideshare.net/ajung/python-mongo-dbtrainingeurop...
Python and
MongoDB tutorial
Debug
faq ! blog ! privacy ! services ! colophonFollow @lanyrd on twitter. add a conferenceadd a conference
The task itself...
• Tries using http://embed.ly/ to find a preview
• Fetches the HTTP headers and first 2048 bytes
• If HTML, attempts to extract the <title>
• If other, gets the file type and size from headers
Behind the scenes...
ar = enhance_link.delay(url)poll_url = '/working/%s/' % signed.dumps({ 'task_id': ar.task_id, 'on_done_url': on_done_url,})if 'ajax' in request.POST: return render_json(request, { 'ok': True, 'poll_url': poll_url, })else: return HttpResponseRedirect(poll_url)
And when it’s done...
from celery.backends import default_backend
...task_id = request.REQUEST.get('id', '')result = default_backend.get_result(task_id)
Configuration
# Carrot / Celery: queue uses RedisCARROT_BACKEND = "ghettoq.taproot.Redis"BROKER_HOST = " 10.11.11.11" # redis serverBROKER_PORT = 6379BROKER_VHOST = "6"
# Task results stored in memcached, so they can # expire automaticallyCELERY_RESULT_BACKEND = "cache"CELERY_CACHE_BACKEND = \ "memcached://10.11.11.12:11211;..."
Tricks
Phantom load testing
• Deploy a new architecture on a brand new EC2 cluster
• Leave your existing site on the old cluster
• Invisibly link to the new stack from an <img width=1 height=1> element on your live site (not for very long though)
• (sensible alternative: find a way to replay log files)
cache_version
add a conferenceadd a conference you are signed in as simonw, do you want to sign out?
calendarcalendar conferencesconferences coveragecoverage profileprofile
searchsearch
ON NOW
Django Plone Pyramid Python Twisted
EuroPython 2011Italy / Florence
19th–26th June 2011
SEPTEMBER
2011Django Open Source Python
Django Python
DjangoCon US 2011United States / Portland
6th–8th September 2011
PyCON FR 2011France / Rennes
17th–18th September 2011
OCTOBER PyCon DE 2011
Django events looking for participants1 Django event is looking for participants
Djangocoverage
By countryIreland 1
Django conferencesDjango conferences
19
6
17
4
52 videosMost recent added 3weeks ago
52 slide decksMost recent added 4hours ago
3 audio clipsMost recent added 1week ago
27 write-upsMost recent added 1week ago
11 handoutsMost recent added 18hours ago
3 notesMost recent added 10hours ago
class Conference(models.Model): ... cache_version = models.IntegerField(default = 0) def save(self, *args, **kwargs): self.cache_version += 1 super(Conference, self).save(*args, **kwargs)
def touch(self): Conference.objects.filter(pk = self.pk).update( cache_version = F('cache_version') + 1 )
{% cache 36000 conf-topics conference.pk conference.cache_version %} <ul class="tags inline-tags meta"> {% for topic in conference.topics.all %} <li><a href="{{ topic.get_absolute_url }}">{{ topic }}</a></li> {% endfor %} </ul>{% endcache %}
from django.models import F
topic.conferences.all().update( cache_version = F('cache_version') + 1)
Bulk invalidation
Signing
Pass data through an untrusted source with confidence that it
hasn't been tampered with
Signing uses
• "Unsubscribe" links in emails
• lanyrd.com/un/ImN6VyI.ii0Hwm7p71DEcGfaVzziQaxeuu
?redirect_to=URL protection
Signed cookies
"You are logged in as simonw" without hitting the database
Signing in Django 1.4
from django.core import signing
signing.dumps({"foo": "bar"})
signing.loads(signed_string)
response.set_signed_cookie(key, value...)
response.get_signed_cookie(key)
Hashed static asset filenames in S3/CloudFront
global.js
global.ed81d119.js
cdn.lanyrd.net/js/global.ed81d119.js
Benefits
• Far futures expiry headers
• Cache-Control: max-age=315360000
• Expires: Fri, 18 Jun 2021 06:45:00 -0000 GMT
• Guaranteed updated CSS in IE
• Deploy new assets in advance of application
• Old versions stick around for rollbacks
./manage.py push_static
• Minifies JavaScript and CSS
• Renames files to include sha1(contents)[:6]
• Pushes all assets to S3
Profiling and debugging production systems
UserBasedExceptionMiddleware
from django.views.debug import technical_500_responseimport sys
class UserBasedExceptionMiddleware(object): def process_exception(self, request, exception): if request.user.is_superuser: return technical_500_response(request, *sys.exc_info())
mysql-proxy
• Very handy lua-customisable proxy for all of your MySQL traffic
• Worst documented software ever
• log.lua - logs out ALL queries
• https://gist.github.com/1039751
django_instrumented
• (Unreleased) code I wrote for Lanyrd
• Collects various runtime stats about the current request, stashes a profile JSON in memcached
• Writes out the profile UUID as part of the HTML
• A bookmarklet to view the profile
mongodb logging
• Super-fast inserts, log everything!
• Capped collections
• Structured queries
• Ask me about it in a few months
For the future...
• Much better profiling, monitoring and alerts
• Varnish in front of everything
• Replicated MySQL for analytics + upgrades
Questions?