Topics ACID vs BASE Starfish Availability
-
Upload
databaseguys -
Category
Documents
-
view
167 -
download
0
Transcript of Topics ACID vs BASE Starfish Availability
Extensible Cluster Based Network Services
Armando FoxSteven GribbleYatin ChawatheEric BrewerPaul Gauthier
University of CaliforniaBerkeley
Inktomi Corporation
Presenter: Ashish Gupta
Advanced Operating Systems
Motivation Proliferation of network-based services Two critical issues must be addressed by
Internet services: System scalability
Incremental and linear scalability Availability and fault tolerance
24x7 operation
Clusters of workstations meet these requirements
Commodity PCs as unit of scaling
Good Cost/performance
Incremental Scalability
“Embarrassingly parallel” workloads
Map well onto workstations
Redundancy of clusters
Masks transient failures
Contribution of this workIsolate common requirements of cluster-
based Internet apps into a reusable substrate
the Scalable Network Services (SNS) framework
Goal: complete separation of *ility concerns from application logic
●Legacy code encapsulationLegacy code encapsulation
●Insulate programmers from nasty engineeringInsulate programmers from nasty engineering
Contribution of this work
Architecture for SNS, exploiting the strength of cluster computing
Separation of content of network services from implementation
Encapsulation of low level functions in a lower layer
Example of a new service
A Programming Model to go with the architecture
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
User ProfileUser ProfileDatabaseDatabase
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
Workers and Front-endsAll control decisions for satisfying user requests localized in the front-ends:
Which Servers to invoke, access profile database, notify the end-user etc.
Workers simple and stateless
Behaviour of service defined entirely at the front-end
Analogy of processes in a Unix pipeline: ls –l | grep .pl | wc
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
User ProfileUser ProfileDatabaseDatabase
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
Front-endsUser Interface to SNS
Queue requests for service
Can Maintain State for many simultaneous outstanding requests
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
User ProfileUser ProfileDatabaseDatabase
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
User ProfileAllows Mass customization of request processing
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
User ProfileUser ProfileDatabaseDatabase
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
WorkersCaches, Service specific Modules
Multiple Instantiation possible
Themselves just perform a specific task, not responsible for load balancing, fault tolerance
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
User ProfileUser ProfileDatabaseDatabase
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
Administrative InterfaceTracking and Visualization of system’s behaviour
Administrative actions
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
User ProfileUser ProfileDatabaseDatabase
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
ManagerCollects load information from the workers
Balances load across workers
Spawn additional workers on increased load, faults
The SNS architecture
C$
LB/FT
Interconnect
FE
$ $
WWWT
FE
FE
WWWA
GUI
Front EndsFront EndsCachesCaches
WorkersWorkers
Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance
AdministrationAdministrationInterfaceInterface
Workers and Front-endsAll control decisions for satisfying user requests localized in the front-ends:
Which Servers to invoke, access profile database, notify the end-user etc.
Workers simple and stateless
Behaviour of service defined entirely at the front-end
Analogy of processes in a Unix pipeline: ls –l | grep .pl | wc User ProfileUser ProfileDatabaseDatabase
Separating the content from implementation
Layered Software model
Previous Components
SNS
Scalable Network Service Support
TACC
Transformation, Aggregation, Caching, Customization
Service
Service Specific Code
SNS
Provides Scalability
Load Balancing
Fault tolerance
High Availability
The SNS Layer
Scalability Replicate well-encapsulated components Prolonged Bursts: Notion of Overflow Pool
Load Balancing Centralized: Simple to implement and
predicable
The SNS Layer
Soft State for fault-tolerance and availability Process peers watch each other Because of no hard state, “recovery” == “restart”
Load balancing, hot updates, migration are “easy” Shoot down a worker, and it will recover Upgrade == install new software, shoot down old Mostly graceful degradation
“Starfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LB
C$
Interconnect
FE
$ $
WWWT
FE
FE
LB/FT
WWWA
“Starfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LB
C$
Interconnect
FE
$ $
WWWT
FE
FE
LB/FT
WWWA
LB/FT
New LB announces itself (multicast), contacted by workers, New LB announces itself (multicast), contacted by workers, gradually rebuilds load tablesgradually rebuilds load tables
If partition heals, extra LB’s commit suicideIf partition heals, extra LB’s commit suicideFE’s operate using cached LB info during failureFE’s operate using cached LB info during failure
Question: How do we build the services in the higher layers?
Transformation
Operation on a single data object that changes its content
TC
Aggregation
Collecting data from several sources and collating it
Caching
Storing/re-computing easier than moving across internet
Can also store post-transformation (or post-aggregation) content
Customization
Per user: for content generation
Per device: data delivery, content “packaging”
The TACC Modela model for structuring services
The TACC Modela model for structuring services
Programming model based on composable building blocks
Many existing services fit well within the TACC model
A Meta-Search Engine In TACC
Uses existing services to create a new service 2.5 hours to write using TACC franework
MetasearchWeb UI
Internet
Datatype-Specific Distillation Lossy compression that preserves semantic content
Tailor content for each client Reduce end-to-end latency when link is slow Meaningful presentation for range of clients
1.2 The Remote Queue Model
We introduce Remote Queues (RQ), ….
65x65x
6.8x6.8x
TranSend SNS Components
Workers = Distillers here Simple restart mechanism for fault-tolerance Each distiller took 5-6 hrs to write SNS Fault tolerance removes worries about
occasional bugs/crashes
Measurements Request Generation:
High performance HTTP request playback engine
Burstiness Handled by the overflow pool
Load Balancing
Metric: Queue Length at distillers
Load reaches threshold:
Manager spawns a new distiller
Scalability
Strategy:
Begin with minimal instance
Increase offered load until saturation
Add more resources to eliminate saturation
Observations:
Nearly perfect linear growth
1 Distiller ~ 23 requests/sec
Front end ~ 70 requests/sec
Ultimate bottleneck:
Shared components of the system (Manager and the SAN)
SAN could be bottleneck for communication-intensive workloads (Example of 10Mb/s eth)
Topic for future research
Conclusion
A layered architecture for cluster-based scalable network services
Authors shielded from software complexity of automatic scaling, high availability, and failure management
New services as composition of stateless workers
A useful paradigm for deploying new Internet services
ACID vs BASE semanticsAn approximate answer delivered quickly is more useful than the exact answer slowly
Simpler, Faster (?)
Aggressive (optimistic)Conservative (pessimistic)
Easy EvolutionDifficult evolution
Approximate Answers OKGuarantees accurate answers
Best EffortFocus on “commit”
Availability FirstAvailability??? (not always)
Weak consistency
- stale data OK
Strong consistency
- data precise or NOT OK
BASEACID
ACID vs BASE semantics
Search Engine as a database1 Big table
Unknown but large growth
Must be truly highly available
An approximate answer delivered quickly is more useful than the exact answer slowly
A DBMS would be too slow
Choose availability over consistency
Graceful degradation: OK to temporarily lose small random subsets of data due to faults
Consistency
Atomicity
Isolation
Durability
Replace with
Availablity
Graceful degradation
Performance
BASE
Basically Available
Soft-State
Eventual ConsistencyDatabase research is about ACID
Why BASE ?
Idea: focus on looser semantics rather than ACID semantics ACID => data unavailable rather than available but inconsistent BASE => data available, but could be stale, inconsistent or approximate Real systems use BOTH semantics Claim: BASE can lead to simpler systems and better performance
Performance: caching and avoidance of communication and some locks (e.g. ACID requires strict locking and communication with replicas for every write and any reads without locks)
Simpler: soft-state leads to easy recovery and interchangable components
BASE fits clusters well due to partial failure
More BASE… Reduces complexity of service implementation ,
consistency for simplicity Fault Tolerance Availability
Opportunities for better performance optimizations in the SNS framework ACID : durable and consistent state across partial failures
This Is relaxed in the BASE model
Example of HotBot
answer The requirements are highly
parallel( many indepent simultaneous users)
The grain size typically corresponds to at most a few CPU seconds on a commodity PC
Answer: BASE semantics allow us to
handle partial failure in clusters with less complexity and cost.
Question 3 When the overflow machines are
being recruited unusually often, what should be done at this time?
Question 4 Does the Front-end crash not lost
any information? If does, what kind information will be lost?
Clustering and Internet Workloads
Internet vs. “traditional” workloads e.g. Database workloads (TPC benchmarks) e.g. traditional scientific codes (matrix multiply, simulated annealing and
related simulations, etc.)
Some characteristic differences Read mostly Quality of service (best-effort vs. guarantees) Task granularity
“Embarrasingly parallel”…why? HTTP is stateless with short-lived requests Web’s architecture has already forced app designers to work around this!
(not obvious in 1990)
Meeting the Cluster Challenges
Software & programming models Partial failure and application semantics System administration Two case studies to contrast programming models
GLUnix goal: support “all” traditional Unix apps, providing a single system image
SNS/TACC goal: simple programming model for Internet services (caching, transformation, etc.), with good robustness and easy administration