Linux High Availability Cluster Selection

download Linux High Availability Cluster Selection

of 34

Transcript of Linux High Availability Cluster Selection

  • 8/9/2019 Linux High Availability Cluster Selection

    1/34

    1 11

    Linux High Availability Cluster Selection

    Tim Burke

    [email protected]

  • 8/9/2019 Linux High Availability Cluster Selection

    2/34

    1 11

    Which cluster product is right for me ?

    There is no one size fits allwinner

    Rapidly evolving marketplace

    The good news: There is a lot to choose from

    The bad news: There is a lot to choose fromStrategy - be an informed consumer

  • 8/9/2019 Linux High Availability Cluster Selection

    3/34

    1 11

    Selection Process / Presentation Outline Identify target applications - usage model

    Identify required cluster feature set

    Open source vs proprietary, product vs project

    Cost factors

    Vendor evaluation

    OEM & ISV endorsements

  • 8/9/2019 Linux High Availability Cluster Selection

    4/34

    1 11

    Identify Target ApplicationsClustering Categories

    High Availibility Clusters

    Database Fileservers

    Off the shelf applications

    Load Balancing Clusters

    Dispatching web traffic

    High Performance Computing

    Large computational problems

  • 8/9/2019 Linux High Availability Cluster Selection

    5/34

    1 11

    High Performance Computing HPC, HPTC cluster attributes

    1. Large # of systems working together to

    solve a common problem -scalability2. Performance, not reliability is of utmost

    importance

    3. Requires custom parallelized applications

    4. Tends to be bleeding edge, early adopters

    5. Example deployments: genetics,pharmacutical, weather, seismic analysis,modeling

  • 8/9/2019 Linux High Availability Cluster Selection

    6/34

    1 11

    Load Balancing Clusters Front end dispatching node (or 2 for

    redundancy)

    Pool of inexpensive back end servers

    Redirect transactions so no 1 system isoverloaded

    Balancing algorithms: round robin,weighted, load based

    Typically used for web server traffic(Apache front end)

    Useful for static content

    Not applicable for dynamic content

  • 8/9/2019 Linux High Availability Cluster Selection

    7/34

    1 11

    High Availability Clusters The need for high availability (HA)

    Overview of high availability features

  • 8/9/2019 Linux High Availability Cluster Selection

    8/34

    1 11

    Reliability, Availability, Serviceability

    (RAS) Users & businesses have high expectations

    1. Reliability - high degree of protection for corporate

    data. Information is a crucial business asset.2. Availability - near continuous data access

    3. Serviceability - procedures to correct problems withminimal business impact

  • 8/9/2019 Linux High Availability Cluster Selection

    9/34

    1 11

    Sources of DowntimeThe Stand ish Group - 2001

    Applicat ion bug or

    error

    Main-system

    hardware fa i lureDatabase error

    Main-server system

    bu g

    Network

    Operator error

    Other server 's

    hardware fa i lure

    Other server 's sys -

    tem bugEnv i ronm enta l cond i -

    tions

    Planned outage

    Other

  • 8/9/2019 Linux High Availability Cluster Selection

    10/34

    1 11

    Downtime Costs -The Standish Group

    Electronic

    resourceplanning

    Supply

    chain

    an-

    E-

    co

    -

    Internet

    bank ing

    Custo

    er

    servicecenter

    essaging0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    9000

    10000

    11000

    12000

    13000

    C os t p er in ute o f d o n ti e (d olla rs)

    Colu

    n 2

  • 8/9/2019 Linux High Availability Cluster Selection

    11/34

    1 11

    No Single Point of Failure (NSPF) Hardware Redundancy - increased overall

    reliability and availability

    1. Multiple paths between systems2. Storage - mirrored, RAID5

    3. Multiple power sources

    4. Multiple external networks

  • 8/9/2019 Linux High Availability Cluster Selection

    12/34

    1 11

    High Availability Clusters Redundancy for fault

    tolerance

    Failover - if 1 node shutsdown or fails, another nodetakes over application load

    Facilitates planned

    maintenance

  • 8/9/2019 Linux High Availability Cluster Selection

    13/34

    1 11

    Failover Involves selecting a target node & moving

    resources - failover policies

    Example resource types1. Physical disk ownership

    2. Filesystems

    3. Applications

    4. Databases

    5. IP addresses

  • 8/9/2019 Linux High Availability Cluster Selection

    14/34

    1 11

    Failover ConfigurationsActive / Passive

    1 node runs application(s)

    Other node on standby for takeover

    Idle node can takeover with no performance degradation

    Active / Active

    All nodes actively running application(s) Workload moves to survivor on failure

    Effectively utilizes capacity (TCO)

  • 8/9/2019 Linux High Availability Cluster Selection

    15/34

    1 11

    Data Integrity ProvisionsCrucial for safe failover of data centric services (filesystem /database)

    In failure scenarios (eg hung node), ensure failed node can not

    access storage - I/O Barriers, I/O Fencing

    Lack of I/O Fencing can result in

    Loss of data (backups ?)

    System crashes

    Common mechanisms

    Power switches

    SCSI reservations

    Watchdog timers

  • 8/9/2019 Linux High Availability Cluster Selection

    16/34

    1 11

    Application MonitoringAll HA clusters monitor node state

    Most monitor key cluster resources - network, disk

    Many monitor application health

    Process existence

    Application check scripts

    HTTP get on web server Record retrieval on database

    Filesystem directory listing

  • 8/9/2019 Linux High Availability Cluster Selection

    17/34

    1 11

    Failover Times

    Don't get too hung up on this

    Remember that data integrity is paramount

    Quoted failover times only include cluster overhead, don't include

    application recovery

    Application startup time

    Filesystem consistency checks

    Database recovery - transaction replay

    Example Product literature cites 5 second failover time

    Can be several minutes for database recovery (size & activitydependent)

  • 8/9/2019 Linux High Availability Cluster Selection

    18/34

    1 11

    Open Source vs Proprietary

    Project vs Product Open source facilitates self-support &

    customization

    Support is a key determinant Products are generally well tested

    Some products are also open source

    If you care enough about high availability &solution stacks, you're likely to go the productroute

  • 8/9/2019 Linux High Availability Cluster Selection

    19/34

    1 11

    Heterogeneous HA Products

    Proprietary offerings that run on Linux, W2K,UNIX

    Unifies user training May compromise flexibility, adaptability or data

    integrity (ouch!)

    Some are Linux products with GUIs that run onother platforms

    Virtually none allow heterogeneous platformswithin the same cluster

  • 8/9/2019 Linux High Availability Cluster Selection

    20/34

    1 11

    Cost Factors

    Beware of hidden charges

    Product base fee

    Application specific charges (Oracle, DB2, NFS, etc) Support

    Some only come with bundled service offerings

    Hardware requirementsProprietary UNIX offerings typically cost severaltimes more

  • 8/9/2019 Linux High Availability Cluster Selection

    21/34

    1 11

    Vendor Evaluation

    Company vision - do their cluster offerings complement ordistract. Futures roadmap.

    Financial Stability

    Ability to impact the marketplace

    Responsiveness - ability to provide ongoing feature enhancements

    Proprietary vs open source

    Product integration - fit with distribution, kernel patches,compatibility & support implications

    New Linux technology vs large monolithic legacy ports

    How long its been on the market

  • 8/9/2019 Linux High Availability Cluster Selection

    22/34

    1 11

    Open Source Projects

    FailSafe - from SGI & SuSE

    Optional data integrity provisions (power switch)

    Supports 16 nodes

    Good set of application kits

    Red Hat Cluster Manager

    Also offered as a product

    Described later in presentation

  • 8/9/2019 Linux High Availability Cluster Selection

    23/34

    1 11

    HA Cluster Product Comparisons

    The ground rules

    Trying to remain objective

    Highlight product strengths Listed in alphabetical order

    Based on web site content as of 10/2002

  • 8/9/2019 Linux High Availability Cluster Selection

    24/34

    1 11

    HP - MC/Serviceguard

    Proprietary - Ported from HP/UX

    Only supported on HP hardware

    Dynamic online addition/removal of members

    Worldwide support services

    Quorum voting membership

    Up to 8 nodes using FibreChannel storage, 2nodes using SCSI

    Compaq Alpha line targeted at HPC clusters

  • 8/9/2019 Linux High Availability Cluster Selection

    25/34

    1 11

    Legato - Availability Manager

    Proprietary

    Heterogeneous (Linux, W2K, Solaris, HP-UX)

    Strong data centric services

    Well integrated with SAN environments

    Replication

    Storage management, volume management, backupApplication monitoring

    Extensive set of application specific modules

  • 8/9/2019 Linux High Availability Cluster Selection

    26/34

    1 11

    PolyServe - Application Manager

    Proprietary

    Application monitoring

    Up to 16 nodes

    Multiple platforms - Linux, W2K, Solaris

    Doesn't require shared storage

    Dynamic member addition/removal

    Centralized management

  • 8/9/2019 Linux High Availability Cluster Selection

    27/34

    1 11

    PolyServe - Matrix Server

    Tailored for Oracle 9i Real Application Clusters

    Concurrent read + write access to data on shared

    storage SAN Cluster filesystem with lock manager +

    distributed cache

    Allows incremental growth by adding servers +storage

    Proprietary

  • 8/9/2019 Linux High Availability Cluster Selection

    28/34

    1 11

    Red Hat - Cluster Manager

    Bundled with RHL Advanced Server 2.1

    Both open source & product

    Data integrity provisions

    Power switches (optional)

    Watchdog timer software

    Application monitoring

    Heterogeneous fileserving via NFS + SambaWeb monitoring GUI

    Also integrated Piranha load balancing cluster

  • 8/9/2019 Linux High Availability Cluster Selection

    29/34

    1 11

    Steeleye - LifeKeeper

    Proprietary - UNIX port

    Multi-platform - Linux, W2K

    Wide set of application kits (separatelypurchaced)

    Established OEM relationships

    Data integrity provisions - via SCSI reservations,requiring kernel patches

    Application monitoring

  • 8/9/2019 Linux High Availability Cluster Selection

    30/34

    1 11

    IBM

    Focusing on HPC

    Rackmounted Intel servers

    Custom solutions (older) XCAT software for management, parallel

    operations, and installation

    (newer) Cluster Systems Mgt (CSM) for Linux

    Remote monitoring, resets, bios console

    Parallel shell

    Requires IBM hardware for imbedded service processor

    High Availability via partnering

  • 8/9/2019 Linux High Availability Cluster Selection

    31/34

    1 11

    Veritas Cluster Server

    Recent Linux port

    16 nodes, wide range of supported apps

    Also runs on Windows, AIX, UNIX, SolarisIntegrates with their storage offerings (volumemanagement, backup, data replication)

    Proprietary

  • 8/9/2019 Linux High Availability Cluster Selection

    32/34

    1 11

    Other Vendors

    Dell

    Strategic partnering for HA software

    Penguin Computing HPC offering via partnership with Scyld Beowulf

  • 8/9/2019 Linux High Availability Cluster Selection

    33/34

    1 11

    Consolidated Solutions

    Egenera

    BladeFrame hardware, backplane eliminates cabling

    Management software, HA, provisioningLinux NetworX

    Turnkey solution, preintegrated hardware + management tools

    Custom hardware, dense racks

  • 8/9/2019 Linux High Availability Cluster Selection

    34/34

    1 11

    Summary

    Know what category of cluster is right for you

    Be knowledgeable of required cluster features

    Weigh your cost criteria

    Chose a vendor you can trust to safeguard yourcorporate assets

    Be wary of marketing collateral