Wix 10M Users Event - Prospero Media Storage

20
Managing 100TB of small files… Prospero Media Storage IGT July 2011 Even t

Transcript of Wix 10M Users Event - Prospero Media Storage

Page 1: Wix 10M Users Event - Prospero Media Storage

Managing 100TB of small files…

Prospero Media Storage

IGT –

July 2011

Event

Page 2: Wix 10M Users Event - Prospero Media Storage

Numbers

• 70TB used space• 700 million files• 200GB and 250,000 files uploaded every day• 1200Mbps bandwidth throughput in peak• 180TB of data is being served out monthly• 3700 Hits per second in peak • 40 storage node servers – 300TB raw space• $0.13 per GB

Page 3: Wix 10M Users Event - Prospero Media Storage

Motivation

• Web 2.0 content serving paradigm shift– Too many files

• 12M users x 1 file = very long tail– Too many connections

• 1M users + keepalive = 1M connections– Living with modern content in web 2.0

• 1 file x (thumbnail + iPhone + Mac) = 3 file copies

Page 4: Wix 10M Users Event - Prospero Media Storage

Traditional Architecture

Centralized Storage (NAS, SAN, DAS etc.)

HT TP

IO IO IO IO

Page 5: Wix 10M Users Event - Prospero Media Storage

Traditional Architecture

Centralized Storage (NAS, SAN, DAS etc.)

HT TP – TOO MANY CONNECTIONS

IO IO IO IO

Page 6: Wix 10M Users Event - Prospero Media Storage

Traditional Architecture

Centralized Storage (NAS, SAN, DAS etc.)

HT TP

IO IO IO IOIO IOIO

Page 7: Wix 10M Users Event - Prospero Media Storage

Traditional Architecture

To o m u c h I O

HT TP

IO IO IOIO IOIOIO

Page 8: Wix 10M Users Event - Prospero Media Storage

Traditional Architecture

Centralized Storage (NAS, SAN, DAS etc.)

HT TP

IO IO IO IOIO IOIO

Cache

Page 9: Wix 10M Users Event - Prospero Media Storage

“There are only two hard things in Computer Science: cache invalidation and naming things”.

-- Tim Bray quoting Phil Karlton

Page 10: Wix 10M Users Event - Prospero Media Storage

Architecture goals

• Symmetric identical server nodes– Simplified management and scaling– Linear scaling out

• No functional / role servers– No single point of failure– No performance bottlenecks

• Multiple datacenters support– DRP support– Geo load distribution

Page 11: Wix 10M Users Event - Prospero Media Storage

Meet Prospero

• Distributed Web content storage system

• Full blown HTTP support

• Runs on low cost commodity hardware

• Adjustable file level replication controls redundancy policy for every content type

• Provides dynamic image manipulation

Page 12: Wix 10M Users Event - Prospero Media Storage

How do we do it?

Page 13: Wix 10M Users Event - Prospero Media Storage

Designed to fail

• Fallback for every operation– Geographical, machine, storage medium

• Write never fails– All files will reach their destination

• Journaling– Tracking all uploaded files

• Pending jobs – Guaranteed file distribution

Page 14: Wix 10M Users Event - Prospero Media Storage

How do we achieve this

• Control the input– define the only unified API

• Functional process isolation– every function deserves its own process by default– watchdogs– monitors– alerts

Page 15: Wix 10M Users Event - Prospero Media Storage

5.static 7.static3.static1.static

0.static

00-1f

2.static

20-3f

6.static

60-7f

4.static

40-5f

HTTP HTTP HTTP

HTTP HTTP HTTP

get 37D815B5.jpg Go to 37 range servers Fallback if not found

Page 16: Wix 10M Users Event - Prospero Media Storage

Fallback Example

Page 17: Wix 10M Users Event - Prospero Media Storage

Node Architecture

Output – Front• lighttpd forkHTTP handler• Dynamic image processor• lighttpd forkVan Gogh• customCross Datacenter Distributer• customLocal Datacenter Distributer• Tornado web framework

applicationSupervisor HTTP handler

Input – Supervisor

Page 18: Wix 10M Users Event - Prospero Media Storage

Real Life

Page 19: Wix 10M Users Event - Prospero Media Storage

It’s all about performance

• Non blocking IO, readiness notification (epoll)• Asynchronous file IO (AIO)• Zero copy (sendfile)• Memory maps• Inter-process binary protocols• UNIX socket• Minimize dynamic memory allocation• lighttpd memory footprint: 50MB

Page 20: Wix 10M Users Event - Prospero Media Storage

Lessons learnt

• Be symmetric• Control the input• Design to failure• Performance matters again• Simple is hard but a must