A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox...

23
A Recovery-Friendly, Self- A Recovery-Friendly, Self- Managing Managing Session State Store Session State Store Benjamin Ling and Armando Fox Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu {bling,fox}@cs.stanford.edu

Transcript of A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox...

Page 1: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

A Recovery-Friendly, Self-Managing A Recovery-Friendly, Self-Managing Session State StoreSession State Store

Benjamin Ling and Armando FoxBenjamin Ling and Armando Fox{bling,fox}@cs.stanford.edu{bling,fox}@cs.stanford.edu

Page 2: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

OutlineOutline

Motivation: What is Session State?Motivation: What is Session State?

Existing solutionsExisting solutions

SSM: Architecture and AlgorithmSSM: Architecture and Algorithm

SSM: Recovery-friendlySSM: Recovery-friendly

SSM: Self-ManagingSSM: Self-Managing

Related and Future WorkRelated and Future Work

ConclusionConclusion

Page 3: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Example of Session StateExample of Session State

Page 4: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Session State and Existing Session State and Existing SolutionsSolutions

We focus on a subcategory of session stateWe focus on a subcategory of session state Single-user, serial access, semi-persistent dataSingle-user, serial access, semi-persistent data

Examples: Temporary application data, Examples: Temporary application data, application workflowapplication workflow

Example of usage (e.g. J2EE):Example of usage (e.g. J2EE):

Browser

App Server1

2

34

56

Page 5: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Existing solutions :Existing solutions :

File System and DatabasesFile System and Databases Poor failure behaviorPoor failure behavior

Lose data (FS)Lose data (FS)

Slow recovery (Both)Slow recovery (Both)

Difficult to administer (DB)Difficult to administer (DB)

Difficult to tune (both)Difficult to tune (both)

In-memory replication using primary/secondary:In-memory replication using primary/secondary: Performance couplingPerformance coupling

Poor failover (uneven load balancing)Poor failover (uneven load balancing)

Page 6: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

GoalGoal

Build a session state store that is:Build a session state store that is:

Failure-friendlyFailure-friendly Does not lose data on crashDoes not lose data on crash Degrades gracefullyDegrades gracefully

Recovery-friendlyRecovery-friendly Recovers fastRecovers fast

Self-ManagingSelf-Managing

High performance High performance Avoids performance couplingAvoids performance coupling

Page 7: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Session State Manager (SSM)Session State Manager (SSM)

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

AppServerSTUB

AppServerSTUB

Redundant, in-memory Redundant, in-memory hash table distributed hash table distributed

across nodesacross nodes

Algorithm: Redundancy similar to Algorithm: Redundancy similar to quorums quorums

• Write to many random nodes, wait for Write to many random nodes, wait for few few (avoid performance (avoid performance coupling)coupling)• Read oneRead one

RAM, Network Interface

Page 8: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

Page 9: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

Page 10: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

Page 11: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

14

Brick 5

Page 12: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Algorithm PropertiesAlgorithm Properties

Client remembers metadataClient remembers metadata Fate sharingFate sharing

Stubs are statelessStubs are stateless

Negative feedback loopNegative feedback loop

Page 13: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

SSM: Recovery-FriendlySSM: Recovery-Friendly

FailureFailure No data is lost, WQ-1 copies of the data remainNo data is lost, WQ-1 copies of the data remain

State is available for R/W during failureState is available for R/W during failure

RecoveryRecovery Start a new brick – don’t need to recover anythingStart a new brick – don’t need to recover anything

No special case recovery code (restart=recovery)No special case recovery code (restart=recovery)

State is available for R/W during brick restartState is available for R/W during brick restart Repair phase does not reduce Repair phase does not reduce

throughput/performancethroughput/performance

Session state is self-recovering Session state is self-recovering User’s access pattern will cause data to be rewrittenUser’s access pattern will cause data to be rewritten

Page 14: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

SSM: Self-ManagingSSM: Self-Managing

Adaptive:Adaptive: Stub maintains count of maximum allowable in-flight Stub maintains count of maximum allowable in-flight

requests to each brickrequests to each brick Additive increase on successful request Additive increase on successful request Multiplicative decrease on timeoutMultiplicative decrease on timeout

Stubs discover load capacity of each brickStubs discover load capacity of each brick

Self-TuningSelf-Tuning

Admission controlAdmission control Stubs say “no” if insufficient bricksStubs say “no” if insufficient bricks Propagate backpressure from bricks to clientsPropagate backpressure from bricks to clients

Turn users away under overloadTurn users away under overload

Self-ProtectingSelf-Protecting

Page 15: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

OVERLOAD

05001000150020002500300035004000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

time in s

#re

q/s

Self-Tuning and Self-ProtectingSelf-Tuning and Self-Protecting

Throughput 250 senders (windowing)

050010001500200025003000350040004500

1 2 3 4 5 6 7 8 9 10 11 12 13 14

time in s

# r

eq

/s

Without Add Inc/Mult Dec adapatation…

Overload with AI/MD adaptation

NORMAL LOAD

0

1000

2000

3000

4000

5000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

time in S

# r

eq

/ s

Page 16: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Other implementation detailsOther implementation details

Garbage collectionGarbage collection

Generational hash tableGenerational hash table Hash table of hash tablesHash table of hash tables Each hash table has an associated time Each hash table has an associated time

rangerange When time has passed, GC that tableWhen time has passed, GC that table

No reference counting, scanning, etc.No reference counting, scanning, etc.

Page 17: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Is it cheap? Is it fast? Is it easy to Is it cheap? Is it fast? Is it easy to use?use?

How much does replication cost?How much does replication cost? With 10 bricks, 1G memory, state size 8k, With 10 bricks, 1G memory, state size 8k,

replication factor of 3 replication factor of 3

Serve around 416,000 concurrent usersServe around 416,000 concurrent users

Configurable request timeout – currently 60 Configurable request timeout – currently 60 msms Dwarfed by computation time and client RT timeDwarfed by computation time and client RT time

Easy to add a brick, kill a brick Easy to add a brick, kill a brick System continues runningSystem continues running

Page 18: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

PublicationsPublications

The Case for a Session State Storage LayerThe Case for a Session State Storage LayerBen Ling, Armando FoxBen Ling, Armando Fox

9th Workshop on Hot Topics in Operating Systems (HotOS 9th Workshop on Hot Topics in Operating Systems (HotOS

IX), Lihue, HI, May 2003IX), Lihue, HI, May 2003

A Self-Managing Session State A Self-Managing Session State LayerLayerBen Ling, Armando Fox Ben Ling, Armando Fox

Accepted to the 5th Annual Workshop On Active Middleware Accepted to the 5th Annual Workshop On Active Middleware Services (AMS 2003), Seattle, WA, June 2003Services (AMS 2003), Seattle, WA, June 2003

http://swig.stanford.edu/public/publicationshttp://swig.stanford.edu/public/publications

Page 19: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Related WorkRelated Work

Palimpsest – Timothy Roscoe, IntelPalimpsest – Timothy Roscoe, Intel Temporal storageTemporal storage

Erasure codingErasure coding

No guarantees, just estimatesNo guarantees, just estimates

DeStor – Andy Huang, StanfordDeStor – Andy Huang, Stanford Persistent, multi-user, non-transactional dataPersistent, multi-user, non-transactional data

FAB – HP LabsFAB – HP Labs Enterprise disk storageEnterprise disk storage

Redundancy at disk block levelRedundancy at disk block level

Page 20: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

Future WorkFuture Work

Do fault analysis and model failureDo fault analysis and model failure Memory and network failure modesMemory and network failure modes

Performance faults?Performance faults?

How to choose replication factor?How to choose replication factor? 10 bricks, WQ of 3, inter-request rate of 5 10 bricks, WQ of 3, inter-request rate of 5

minutes -> “5 nines” of availability if MTTF of minutes -> “5 nines” of availability if MTTF of bricks > 22 minutesbricks > 22 minutes

Adaptively change replication factor?Adaptively change replication factor?

Page 21: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

SSM: Relaxing ACIDSSM: Relaxing ACID

A – we guaranteeA – we guarantee

C – guaranteed by workload (full rewrite of state)C – guaranteed by workload (full rewrite of state)

I – guaranteed by workload (single user, serial-I – guaranteed by workload (single user, serial-access)access)

D – relaxed (ephemeral guarantee, RAM enough)D – relaxed (ephemeral guarantee, RAM enough)

Fast, simple, clean recoveryFast, simple, clean recovery No data loss on failureNo data loss on failure Data can be R/W during failure/recoveryData can be R/W during failure/recovery

Self-ManagingSelf-Managing

Page 22: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

SummarySummary

We have built a system for:We have built a system for: Semi-persistent storage for single-user, serial-access Semi-persistent storage for single-user, serial-access

datadata Recovery friendlyRecovery friendly::

Crash Only – Crash-safe, fast recoveryCrash Only – Crash-safe, fast recovery No special case recovery codeNo special case recovery code Reboot any individual nodeReboot any individual node Continuous data availabilityContinuous data availability

Self-ManagingSelf-Managing:: Self-Tuning and ProtectingSelf-Tuning and Protecting Simple management and fault enforcement modelSimple management and fault enforcement model

Benjamin LingBenjamin [email protected]@cs.stanford.edu

http://swig.stanford.edu/http://swig.stanford.edu/

Page 23: A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox {bling,fox}@cs.stanford.edu.

© 2003 Benjamin Ling

SSM: Recovery-Friendly, Self-Managing SSM: Recovery-Friendly, Self-Managing StoreStore

Questions or Comments?Questions or Comments?

Benjamin LingBenjamin [email protected]@cs.stanford.edu

http://swig.stanford.edu/http://swig.stanford.edu/