eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large...

40
Paul Strong Distinguished Engineer, eBay Research Labs Acting Chair, Open Grid Forum (OGF) eBay - Very Large Distributed Systems (a.k.a. Grids) @ Work ® BEinGRID Industry Days, 5 th June 2008

Transcript of eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large...

Page 1: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Paul StrongDistinguished Engineer, eBay Research LabsActing Chair, Open Grid Forum (OGF)

eBay -Very Large Distributed Systems (a.k.a. Grids) @ Work

®

BEinGRID Industry Days, 5th June 2008

Page 2: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Copyright Notice

© 2008 eBay Inc. All rights reserved.• No part of these materials may be reproduced or

transmitted in any form, by any means (electronic, photocopying, recording, or otherwise) without the prior permission of eBay Inc.

©2008, eBay Inc.

®

prior permission of eBay Inc. • eBay and the eBay logo are registered

trademarks of eBay Inc. • PayPal and the PayPal logo are registered

trademarks of PayPal, Inc.• Other trademarks and brands are the property of

their respective owners. • Please do not take our picture or record the

class/session without asking permission.

Page 3: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

A vehicle sells every minute

A motors part or accessory sells every second

eBay – The 30 Second Introduction!

On an average day on eBay…

eBay users trade about $2,039 worth of goods on the site every second

©2008, eBay Inc.

®

A motors part or accessory sells every second

Diamond jewelry sells every 2 minutes

1.3m people make all or part of their living selling on*

*ACNielsen International Research, June 2006

Page 4: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

PayPal – The Even Shorter Summary!

• 141 million accounts (57 m active*)

• 17 currencies

• 2008 - $47 billion TPV

• Q4 2008 –

©2008, eBay Inc.

®

$14 billion

$1806 TPV/sec

12% US e-Commerce, 8 global e-Commerce

#2 e-Commerce payment mechanism in US (#1 VISA)

#1 e-Commerce payment mechanism in UK, Australia

* At least 1 transaction in last 12 months

Page 5: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Why eBay Is A Useful Example

New Challenges

Extreme Engineering

Everyday useThe Bleeding Edge

©2008, eBay Inc.

®

Technology trickle down/transfer

Everyday useThe Bleeding Edge

Page 6: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay’s Drivers

• Extreme Scale276m Registers Users, 113m+ Items, 6m+ New Items Per Day

• Extreme GrowthNear exponential growth in listings for most of history – 12 years

©2008, eBay Inc.

®

• Extreme AgilityRoll code to the site every 2 weeks

• Constant, predictable presenceMust be 24x7x365

• Efficiency

Failure To Keep Up Is Not An Option!

Page 7: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Challenges

• Scaling The Database

• Scaling Services

• Managing At Scale

• Better Management Through Semantics

©2008, eBay Inc.

®

• Better Management Through Semantics

Page 8: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Grid – eBay’s Perspective?

• Inevitable consequence of trends• Network of servers � Fabric of resources

• Server centric apps � Network distributed services

• Network distributed services + platform• Scales (performance, throughput) with network

• Inherent resilience

©2008, eBay Inc.

®

• Inherent resilience

• Flexibility (if loosely coupled)

• Middle-ware• Meta-OS maps workload onto resources based on policy

• Incomplete today

• General purpose platform• Compute intensive

• Data intensive

• Transaction intensive

• Hybrid

Page 9: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Grids @ eBay

• Build and release – “Traditional” Grid• 300+ servers

• Search – Scatter/gather transactional• 3000-4000 servers

©2008, eBay Inc.

®

• Auction Platform – Transactional• 8000+ Blades

• Virtualized Database – Data Grid630+ Database Instances

Extensive caching and distribution

Page 10: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

High Level Data/Storage Architecture

©2008, eBay Inc.

®

Page 11: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #1Making The Database Scale

• Second Database for failover• CGI pools, Listings, Pages, and Search continued to scale horizontally

However …

By November 1999, the database servers approached their limits of physical growth.

S/W Load Balancer S/W Load Balancer S/W Load Balancer S/W Load Balancer

©2008, eBay Inc.

®

Web Server

OS

“Listings”

Web Server

OS

“Pages”

ApacheCOTS Search

UNIX

“Search”

Web Server

C++OS

“CGIn”

RDBMS

UNIX

bear.ebay.com

RDBMS

UNIX

bull.ebay.com1999

Page 12: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #1Making The Database Scale

• Database "split" technology.• Logically partition database into separate instances.• Horizontal scalability through 2000, but not beyond.

Web Server Web Server Apache

S/W Load Balancer S/W Load Balancer S/W Load Balancer S/W Load Balancer

Web Server

©2008, eBay Inc.

®

Web Server

OS

“Listings”

Web Server

OS

“Pages”

ApacheCOTS Search

UNIX

“Search”

Web Server

C++OS

“CGIn”

RDBMS

UNIX

bear.ebay.com

RDBMS

UNIX

bull.ebay.com

RDBMS

UNIX

chard.ebay.com

RDBMS

UNIX

cab/bongo.ebay.com

2000

Page 13: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #1Virtualizing the Database

Attributes Catalogs RulesCATY

1…NUser Account Feedback Misc API Scratch

Application Servers

©2008, eBay Inc.

®

• Separate Application notion of a database from physical implementation• Databases may be combined and separated with no code changes• Reduce cost of creating multiple environments (Dev, QA, …)• Application can continue to function without non-critical data (markdown)

DB 1 DB 2 DB 3

Page 14: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #1Virtualizing & Scaling the Database

©2008, eBay Inc.

®

November, 1999November, 1999

Page 15: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #1Virtualizing & Scaling the Database

©2008, eBay Inc.

®

December, 2002SAN

Page 16: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #1Virtualizing & Scaling the Database

• Scales Out276 million registered users

113 million Items

6+ million new items per day

34 billion SQL transactions per day

600+ production database instances (inc replicas)

100+ clusters

©2008, eBay Inc.

®

100+ clusters

• CheaperSmaller, potentially commodity, servers

• Highly Resilient2-4 copies of everything

Minimized impact of outage to [relatively] small sub-set of data

• Flexible/AgileEasy to change – database, schemas, partitioning etc.

Minimal impact on architecture or code

Page 17: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #2Scaling Services

©2008, eBay Inc.

®

Page 18: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #2Scaling Services

• Partition code into functional areas– Application is specific to a single area (Buying, Selling etc.)

– Domain contains common business logic across applications

• Restrict inter-dependencies– Applications depend on Domains, not on other applications

– No dependencies among shared domains

User Application Selling Application Buying Application Billing Application Search Application

©2008, eBay Inc.

®

User Application

User Domain

Selling Application

Selling Domain

Buying Application

Buying Domain

Billing Application

Billing Domain

Search Application

Search Domain

Personalization Domain Shared Billing DomainUser Validation Domain

Shared Buying Domain myEBay Domain Shared Search Domain

Core Domain API Domain

Lookup Domain

Shared

Domains

Applications

Page 19: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #2Scaling The Application

• Segment functions into separate application pools– Minimizes/isolates DB dependencies

– Allows for parallel development, deployment and monitoring

Load

Balancer

Load

Balancer

ViewItem Poolhttp://cgiX.ebay.com...

SYI Poolhttp://cgiY.ebay.com...

©2008, eBay Inc.

®

Web Servers

App Servers

Web Web

Load

Balancer

AS AS AS

Web Web

Load

Balancer

AS AS AS

Load

Balancer

User Acct Caty1 Caty20+

Page 20: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

eBay Example #2Scaling The Application

• Everything behaves as loosely coupled services

• Minimize inter-dependencies

• Infrastructure is like a giant FPGA– Potential to re-program by re-routing traffic

©2008, eBay Inc.

®

– Potential to re-program by re-routing traffic

• Scales– Scale out means scaled throughput and resilience

– 16000+ concurrent instances

– 8000+ servers (mainly blades)

• Efficiency– Run traffic from different time zones on the same server but

different instances

Page 21: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Scaling Search –Voyager

• Real-time feeder infrastructureReliable multi-cast from primary database to search nodes

• Real-time indexingSearch nodes update index in real time from messages

©2008, eBay Inc.

®

• In memory search index

• Horizontal segmentation (scatter, gather)Search index divided into N slices (“columns” )

Each slice replicated to M instances (“rows”)

Aggregator parallelizes query over all N slices, load balances over M instances

• CachingCache results for highly expensive and frequently used queries

Page 22: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Architectural Lessons Learnt

• Scale Out, And Enable Scaling Up TooHorizontal scaling at every tier plus multi-threading too

Enable deployment time choice

Functional decomposition

• Prefer Asynchronous Integration

©2008, eBay Inc.

®

• Prefer Asynchronous IntegrationMinimize availability coupling

Improve scaling options

• Virtualize ComponentsReduce physical dependencies

Improve deployment flexibility

• Design For FailureAutomated failure detection and notification

“Limp mode” operation of business features

Page 23: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

The Big Problem

# R

ela

tion

sh

ips

Management complexity scales with this

©2008, eBay Inc.

®

# R

ela

tion

sh

ips

# Components

Page 24: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Understanding Relationships

Service A is composed of

Persistence Sub-Service B

Business Logic Sub-Service C

Presentation Sub-Service DA

B C D

©2008, eBay Inc.

®

Page 25: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Understanding Relationships

A

B C D

Business Logic Sub-Service C is composed of

A Load Balancing Service

Several Application Instances

©2008, eBay Inc.

®

App App LBS

Page 26: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Understanding Relationships

A

C

The Application Instances are hosted on

B D

Operating System Instances

The Load Balancing Service is hosted on

A Load Balancer Operating System

©2008, eBay Inc.

®

App App LBS

OS OS LB

Page 27: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Understanding Relationships

A

CB D

The Operating System Instances are hosted on

Servers or Virtual Servers, which are in turn hosted on servers

The Load Balancer OS is hosted on

A Physical Load Balancer

©2008, eBay Inc.

®

OS OS LB

App App LBS

Svr Svr LB

VS

Page 28: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Virtualized Platform

Biz Process/-Service

A

CB D

Categorizing The Components

©2008, eBay Inc.

®Data Business Logic Presentation

Physical

Virtualized Physical

OE

Virtualized OE

Platform Instance

OS OS LB

App App LBS

Svr Svr LB

VS

Page 29: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Interaction/Traffic RelationshipsStarting To Look Complicated!

©2008, eBay Inc.

®

Page 30: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Relationships Are Everything!

• Everything is interconnected

• Changing one thing causes ripples

• How you connect things together determines business functionality and business value

• Agility is the ability to change these relationships dynamically (easier with loosely coupled services)

©2008, eBay Inc.

®

dynamically (easier with loosely coupled services)

• Virtualization is about standardizing a relationships and interposing/isolating one end from the other

• Understanding these relationships allows you toTie business processes to the infrastructure they run on

Map value to cost

Understand and manage traffic flow

Understand and manage provisioning etc.

• It’s all about managing relationships, not things!

Page 31: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Managing Complexity Using PatternsExample - Storage

• Minimize Set Of ComponentsProducts - HBAs, Arrays, Switches

Vendors

• Small Set Of Storage Classes

©2008, eBay Inc.

®

• Small Set Of Storage ClasseseBay has 5 or 6 patterns

Each reflects an SLO in terms of

Performance

Availability

Cost

Page 32: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Typical eBay Storage Pattern

Stripe Volume Across LUNs

From Different Loops

Cluster

©2008, eBay Inc.

®

Disks On Dual Loops

Mirror Whole Disks

Stripe Across Mirrors

Partition Stripes - LUNs

Dual SANs

4 Paths Per Node

Page 33: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Patterns Are Successful…

• Scale– 15 SANS / 245 switches / 7,800 ports

– 78 arrays / 3.5 PB’s / 56,000 luns

– 180 Clusters / 890 servers / 650 databases

• Agility

©2008, eBay Inc.

®

• Agility– 11 TB/Week Provisioned

– 85 New Data Volumes Per Week

– 10 Database Moves/Week

Managed by 11 People

Page 34: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

But…

• Patterns constrain agility in other dimensions– Adoption of new vendors

– Adoption of new products

– Adoption of new application or architectural patterns

• Especially if patterns and tools are heavily

©2008, eBay Inc.

®

• Especially if patterns and tools are heavily automated

Page 35: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

The Future…

• Datacenter is becoming non-deterministic or chaotic

• Emergent behavior of services

©2008, eBay Inc.

®

Page 36: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

The Future…Better Management Through Semantics

• Capture relationships in Semantic Query Service (in memory, custom OWL/RDF based Ontology)– Extensible

– Patterns in data and not in code!

• Feed from CMDB and from Run Time Telemetry

©2008, eBay Inc.

®

• Feed from CMDB and from Run Time Telemetry

• Query-able by other management services

• Visualized as graphs (DAGs) through UI

Page 37: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Of Grids & Clouds

• Simplify building applications

• Eliminate management of unnecessary infrastructure

• Grids can supply services accessed via clouds

©2008, eBay Inc.

®

• Grids can supply services accessed via clouds

• Grids can run in/on clouds

Page 38: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Future Platforms & Business Paradigms

• Clouds & Grids ride the wave of application, and by thus by inference, business process disaggregation

• Eliminate/out-source non-core business process

©2008, eBay Inc.

®

• Access via clouds as commodity services?

• Opportunity to mash up new process• Program via BPEL or something simpler?

• Opportunity for new platforms

Page 39: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Conclusions

• Large scale, distributed systems (aka Grids) are the platform of the present and future

• If you can’t or don’t want to manage one, you will use someone else’s in the cloud

©2008, eBay Inc.

®

• Disaggregation will lead to biz process re-factoring

• Opportunities• New services via mashed up biz processes

• New platforms to support ubiquitous biz process (extending SaaS models, eBay, Google, Amazon)

• Businesses with no infrastructure, just good ideas, smart people, minimal technical knowledge and access to the cloud!

Page 40: eBay - Very Large Distributed Systems (a.k.a. Grids) …Strong+-+eBay+-+BE...eBay - Very Large Distributed Systems ... • Understanding these relationships allows you to ... Stripe

Thank You

Paul [email protected] Engineer

®

eBay Research Labs,eBay Inc.