Architecture at PBS
-
Upload
openpbs -
Category
Technology
-
view
1.473 -
download
0
description
Transcript of Architecture at PBS
![Page 1: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/1.jpg)
Architecture of PBS.org
DCPython - June 7, 2011
![Page 2: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/2.jpg)
PBS is…
• PBS is a national federation of independently owned and operated public television stations and producers– Each with their own management and development
resources
• 1500+ highly trafficked websites:– http://www.pbs.org/– http://www.pbs.org/nova/ – http://pbskids.org/– http://pbskids.org/sesame/– http://video.pbs.org/
• Enterprise services/APIs
![Page 3: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/3.jpg)
PBS is not!
• Radio is easy… We do television!
• Or any of the other ~200 local stations.
![Page 4: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/4.jpg)
What we do
• Technology leadership within public broadcasting community
• Distribution of national programming content
• Services to local stations• Core application development.
Yeah!!!
![Page 5: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/5.jpg)
A few of our sites
![Page 6: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/6.jpg)
History of PBS.org
Early 1990’s: Hand rolled static htmlLate 1990’s: Hand crafted static html + CGI!
Most of 2000’s: Zope/Plone CMS generated static html2008-10: Django generated static html
Launched Oct 2010: Django all the way
![Page 7: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/7.jpg)
COVE API
• Contains the metadata for all PBS videos online including pointers to streaming video
• Needed to be:– Secure– Fast– Scalable
![Page 8: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/8.jpg)
COVE API – Technology Stack
• Amazon Elastic Cluster Computing (EC2)• Amazon Relational Database Service (RDS)• Linux• Python• Django• Piston for REST API
![Page 9: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/9.jpg)
COVE API - Architecture
Internet
Elastic Load Balancer
Auto Scale Array
App Server 1 App Server N…
HA Proxy
RDS Master RDS Slave 1RDS Slave 1RDS Slave 1
App Sync Server
S3 Backups
![Page 10: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/10.jpg)
COVE API – Management Tools
• Amazon Web Service Console• RightScale• Splunk
![Page 11: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/11.jpg)
COVE API – Interesting Stuff
• Easy to load test– Duplicate environment for several days
• Easy to scale– Autoscale array grows automatically
• Easy to upgrade– Each server built from vanilla base
![Page 12: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/12.jpg)
COVE API – Lessons learned
• Use normalized data for administration and de-normalized data for API
![Page 13: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/13.jpg)
COVE API – Lessons learned
• Piston is fine, but lacks flexibility without significant customization– TastyPie?
• JSON is probably good enough• Don’t get fancy with your endpoints• Stick to REST principles• Don’t get fancy with your authentication– Use OAuth2 or simple token
![Page 14: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/14.jpg)
PBS.org and Merlin API
• PBS.org– Slim, fast layer– Pulls data from Merlin API– Uses memcache extensively– Currently Django, but could be anything
(Flask?)
• Merlin API– Aggregate content from distributed CMSes– Expose via standardized API– Power PBS.org and more
![Page 15: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/15.jpg)
Merlin API – Technology stack
• Python• Django• MySQL• Piston• Solr• Celery• RabbitMQ
• Amazon Web Services (“cloud”)– EC2– RDS - Relational Database Service– ELB - Elastic Load Balancing– Cloudfront CDN– S3 Storage
![Page 16: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/16.jpg)
Data flow
RSS FeedIngestor
Standardized API
![Page 17: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/17.jpg)
Merlin API architecture
API Endpoint – Django Piston
Search serviceDjango-haystack
Indexing serviceSolr
Data layer – MySQL (RDS)
AdministrationDjango admin
Feed ingestionCelery
![Page 18: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/18.jpg)
Merlin API server topology
Elastic Load Balancer
Internet
S3 backups
Celery MasterDB RDS
SolrIndex
App #NApp #NApp #NApp #n
Autoscalingarray
![Page 19: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/19.jpg)
Merlin API – Management Tools
• Amazon Web Service Console• RightScale• Splunk
![Page 20: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/20.jpg)
API - Piston/Haystack/Solr
class WebObjectIndexHandler(BaseHandler): ... def get_queryset(self): ... return PistonSearchQuerySet().models(*models)
from haystack.query import SearchQuerySetclass PistonSearchQuerySet(SearchQuerySet): ... def __getitem__(self, k): ... return [IndexSerializer(i) for i in super(PistonSearchQuerySet,
self).__getitem__(k)]
![Page 21: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/21.jpg)
Feed ingestor - Celery
from celery.decorators import task, periodic_task
@periodic_task(run_every=timedelta(seconds=300))def update_webobject_states(): ... solr_visible = WebObject.children.filter(visible=True) solr_visible = solr_visible.exclude( flag__api_visible=True, available__isnull=True) ... updated = solr_visible.update(visible=False, is_indexed = False) ...
signals.bulk_update.send('tasks.update_webobject_states')
![Page 22: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/22.jpg)
Merlin API - Lessons learned
• Memcached was not necessary• Denormalized search data via Solr index is much
faster than querying database• Asynchronous task delegation is awesome• Celery prone to memory leaks• App server array for easy horizontal scaling– Even if not autoscaling, increase min servers
• Never trust data you don’t control (validate!)
![Page 23: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/23.jpg)
Resources
• http://lucene.apache.org/solr/• http://haystacksearch.org/• http://celeryproject.org/• http://celeryproject.org/docs/django-
celery/• http://aws.amazon.com/
![Page 24: Architecture at PBS](https://reader035.fdocuments.net/reader035/viewer/2022081518/545ca6a4af7959b4098b48bb/html5/thumbnails/24.jpg)
PBS Developer Community
• Dedicated to making open.PBS the industry standard in open development communities.
http://open.pbs.org/https://github.com/pbs