MySpace.com MegaSite v2
Aber Whitcomb – Chief Technology OfficerJim Benedetto – Vice President of TechnologyAllen Hurff – Vice President of Engineering
Previous Myspace Scaling LandmarksFirst Megasite
64+ MM Registered Users38 MM Unique Users260,000 New Registered Users Per Day23 Trillion Page* Views/Month50.2% Female / 49.8% MalePrimary Age Demo: 14-34
185 M
70 M6 M1 M100K
MySpace Company OverviewToday
As of April 2007185+ MM Registered Users90 MM Unique Users
Demographics50.2% Female / 49.8% MalePrimary Age Demo: 14-34
Internet Rank Page views in ‘000s
MySpace #1 43,723
Yahoo #2 35,576
MSN #3 13,672
Google #4 12,476
facebook #5 12,179
AOL #6 10,609
Source: comScore Media Metrix March - 2007
Total Pages Viewed - Last 5 Months
Source: comScore Media Metrix April 2007
Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 20070
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
50,000
MySpaceYahooMSNGoogleEbayFacebook
MM
Site Trends350,000 new user registrations/day1 Billion+ total imagesMillions of new images/dayMillions of songs streamed/day4.5 Million concurrent usersLocalized and launched in 14 countries
Launched China and Latin America last week
Technical Stats7 Datacenters6000 Web Servers250 Cache Servers 16gb RAM650 Ad servers250 DB Servers400 Media Processing servers7000 disks in SAN architecture70,000 mb/s bandwidth35,000 mb/s on CDN
MySpace Cache
Relay System DeploymentTypically used for caching MySpace user data.
Online status, hit counters, profiles, mail.Provides a transparent client API for caching C# objects.
ClusteringServers divided into "Groups" of one or more "Clusters".Clusters keep themselves up to date.Multiple load balancing schemes based on expected load.
Heavy write environmentMust scale past 20k redundant writes per second on a 15 server redundant cluster.
Relay SystemPlatform for middle tier messaging.
Up to 100k request messages per second per server in prod.Purely asynchronous—no thread blocking. Concurrency and Coordination RuntimeBulk message processing.Custom unidirectional connection pooling.Custom wire format.Gzip compression for larger messages.Data center aware.Configurable components
Relay ServiceIRelayComponents
Berkeley DB
Non-locking Memory Buckets
Fixed Alloc SharedInterlocked Int Storage for Hit
Counters
Message Forwarding
CCR
Message Orchestration
CCR
RelayClient
RelayClient
Socket Server
Code Management:Team Foundation Server, Team System, Team Plain, and Team Test Edition
Code ManagementMySpace embraced Team Foundation Server and Team System during Beta 3MySpace was also one of the early beta testers of BizDev’s Team Plain (now owned by Microsoft).Team Foundation initially supported 32 MySpace developers and now supports 110 developers on it's way to over 230 developersMySpace is able to branch and shelve more effectively with TFS and Team System
Code Management (continued)
MySpace uses Team Foundation Server as a source repository for it's .NET, C++, Flash, and Cold Fusion codebasesMySpace uses Team Plain for Product Managers and other non-development roles
Code Management: Team Test EditionMySpace is a member of the Strategic Design
Review committee for the Team System suiteMySpace chose Team Test Edition which reduced cost and kept it’s Quality Assurance Staff on the same suite as the development teamsMySpace using MSSCCI providers and customization of Team Foundation Server (including the upcoming K2 Blackperl) was able to extend TFS to have better workflow and defect tracking based on our specific needs
Server Farm ManagementCodespew
CodeSpewMaintaining consistent, always changing code base and configs across thousands of servers proved very difficultCode rolls began to take a very long timeCodeSpew – Code deployment and maintenance utility
Two tier applicationCentral management server – C#Light agent on every production server – C#
Tightly integrated with Windows Powershell
CodeSpewUDP out, TCP/IP inMassively parallel – able to update hundreds of servers at a time. File modifications are determined on a per server basis based on CRCsSecurity model for code deployment authorizationAble to execute remote powershell scripts across server farm
Media Encoding/Delivery
Media StatisticsVideos60TB storage15,000 concurrent streams60,000 new videos/day
Music25 Million songs142 TB of space250,000 concurrent streams
Images1 Billion+ images80 TB of space150,000 req/s8 Gigabits/sec
4th Generation Media EncodingMillions of MP3, Video and Image Uploads Every DayAbility to design custom encoding profiles (bitrate, width, height, letterbox, etc.) for a variety of deployment scenarios.Job broker engine to maximize encoding resources and provide a level of QoS.Abandonment of database connectivity in favor of a web service layerXML based workflow definition to provide extensibility to the encoding engine.Coded entirely in C#
4th Generation Encoding Workflow
DFS 2.0
CDNFTP ServerMediaProcessor
Filmstrip for Image Review
Web Service Communication
Layer
(Any Application)
Upload
Job Broker
User Content
Thumbnails for Categorization
MySpace Distributed File System
MySpace Distributed File SystemProvides an object-oriented file storeScales linearly to near-infinite capacity on commodity hardwareHigh-throughput distribution architectureSimple cross-platform storage APIDesigned exclusively for long-tail content
Demand
Acce
sse
s
SledgehammerCustom high-performance event-driven web server coreWritten in C++ as a shared libraryIntegrated content cache engineIntegrates with storage layer over HTTPCapable of more than 1Gbit/s throughput on a dual-processor hostCapable of tens of thousands of concurrent streams
DFS Interesting FactsDFS uses a generic “file pointer” data type for identifying files, allowing us to change URL formats and distribution mechanisms without altering data.Compatible with traditional CDNs like AkamaiCan be scaled at any granularity, from single nodes to complete clustersProvides a uniform method for developers to access any media content on MySpace
Appendix
Operational Wins
Pages/Sec0
50
100
150
200
250
300
2005 Server
MySpace Disaster Recovery Overview
Distribute MySpace servers over 3 geographically dispersed co-location sites
Maintain presence in Los AngelesAdd a Phoenix site for active/active configurationAdd a Seattle site for active/active/active with Site Failover capability
Distributed File System Architecture
Storage Cluster
Users
DFS Cache Daemon
BusinessLogic
Sledgehammer
Cache Engine
Server Accelerator Engine
Top Related