Scalable Group Communication In Heterogeneous Cluster

39
Scalable Group Communication In Heterogeneous Cluster Filip Hanik Apache Software Foundation June 30 th , 2006

description

Scalable Group Communication In Heterogeneous Cluster. Filip Hanik Apache Software Foundation June 30 th , 2006. Who am I. [email protected] Tomcat Committer / ASF member Responsible for session replication and clustering Been involved with ASF since 2001. What we will cover. - PowerPoint PPT Presentation

Transcript of Scalable Group Communication In Heterogeneous Cluster

Page 1: Scalable Group Communication In Heterogeneous Cluster

Scalable Group CommunicationIn Heterogeneous ClusterFilip Hanik Apache Software FoundationJune 30th, 2006

Page 2: Scalable Group Communication In Heterogeneous Cluster

2

Who am I• [email protected]• Tomcat Committer / ASF member• Responsible for session replication

and clustering• Been involved with ASF since 2001

Page 3: Scalable Group Communication In Heterogeneous Cluster

3

What we will cover• Introduction to group communication• Challenges in group/cluster

communication• Today’s Solutions• Detailed Tribes overview• Tribes – design/configuration/usage• Problems and their solutions• Q & A

Page 4: Scalable Group Communication In Heterogeneous Cluster

4

What is Group Communication

• 1-to-n communication between software/hardware nodes

• Designed to reduce packets compared to 1-to-1 (point to point) communication

• Also referred to as broadcasting and/or multicasting

• broadcast != multicast• broadcast – all nodes receive• multicast – interested (subscribed) nodes receive

• Popular academic research topic!! Lots of information available

Page 5: Scalable Group Communication In Heterogeneous Cluster

5

Challenges in Group Communication

• Multicast is most commonly used• Group consistency and leadership• Delivery guarantee• Group delivery guarantee• Ordering and total ordering• Flow control• Multiple networks

Page 6: Scalable Group Communication In Heterogeneous Cluster

6

Today’s Solutions• Dozens if not hundreds academic

products• Not maintained, Not supported, Proprietary

• Many open source projects• Appia, Spread, Erlang, JGroups…list goes

on• Most multicast based to solve the 1-to-

n packet reduction problem

Page 7: Scalable Group Communication In Heterogeneous Cluster

7

What is uniform group model?

• Nodes are identical• All nodes process, send and receive

message in the same way• All nodes have the same applications • Total ordering is based on the

complete group• Note: Not the official definition for

what uniformity in a group setting is

Page 8: Scalable Group Communication In Heterogeneous Cluster

8

When isn’t the uniformity enough?

• When processes on each node are dynamic - activate, passivate, short and long lived

• Example, Tomcat webapps• Example, heterogeneous hardware environments• Application management vs. application data

replication• Messages with different priorities

• Example, session attribute being replicated vs. a 25MB war file being transferred

• Need different guarantee levels• When most messages are 1-to-m m<n

Page 9: Scalable Group Communication In Heterogeneous Cluster

9

Challenges in heterogeneous clusters

• Same challenges as in homogeneous environments

• Node attributes change runtime• Nodes carry different responsibilities• Total order messages that are sent

1-to-m where m < n

Page 10: Scalable Group Communication In Heterogeneous Cluster

10

What is Tribes?• Tribes is a messaging framework with

group communication capabilities• 100% Java, Apache Licensed (2.0)• Born out of the cluster/session

replication code from Tomcat 5.0-5.5 early 2006

• Currently alpha, will become the communication framework for Tomcat’s next cluster implementation

• Ideas from 2001

Page 11: Scalable Group Communication In Heterogeneous Cluster

11

Why Tribes?• Many frameworks are not flexible enough• Not enough features• Messages were guaranteed, without

delivery feedback• Static configurations for message delivery• Based on 1-to-m delivery, where m<n• License, license, license…

Page 12: Scalable Group Communication In Heterogeneous Cluster

12

Why Tribes?• Research gap - platforms are

proprietary and often suggest protocols that are not standard

• Opportunities for httpd & Tomcat and other ASF software integration for more advanced and intelligent clusters

• Separation of communication layer• Did I say Apache License?

Page 13: Scalable Group Communication In Heterogeneous Cluster

13

Why not Tribes• TCP is connection based• When you always want to send 1-to-n• Unique scenario where a highly

customized solution might be the best fit

• Its not the one fit all solution, if such exists

Page 14: Scalable Group Communication In Heterogeneous Cluster

14

Goals• Simplify peer-to-peer and peer-to-group

communication for distributed applications• Flexible enough to support a wide range of

applications under one runtime configuration• Provide instant feedback on message

delivery• Concurrent message delivery, even between

two nodes• Parallel delivery to multiple nodes• Clean, intuitive and easy to use, even for

complex tasks• All this with low overhead

Page 15: Scalable Group Communication In Heterogeneous Cluster

15

Feature Overview• Pluggable Modules• Guaranteed Messaging• Different Guarantee Levels• Per message delivery semantics(!)• Pluggable Interceptors (runtime)• Delivery feedback – even for async• Concurrent and parallel delivery• Fixed node hierarchy

Page 16: Scalable Group Communication In Heterogeneous Cluster

16

Feature: Pluggable Modules• All major components can be swapped out,

simple interfaces defined• Needed when customization is required for

lower level IO operations• Example

• Multicast not available• Proprietary network protocols• SSL

• Goal: Default Implementation to be enough for 80% of applications that require messaging

Page 17: Scalable Group Communication In Heterogeneous Cluster

17

Feature: Guaranteed Msg Delivery

• Assume 1-to-m delivery, (m < n)• Default implementation is TCP based

• java.io & java.nio• Most cases, TCP(java) will outperform UDP

when flow control and ack/nack for guaranteed delivery is implemented

• java.io support for platforms with poor NIO implementations

• java.nio preferred

Page 18: Scalable Group Communication In Heterogeneous Cluster

18

Feature: Guarantee Levels• By default supports 3 levels• NO_ACK – message was sent

• Relies on TCP to deliver without node feedback• ACK – message was received

• Remote node replies with an ACK• SYNC_ACK – message was processed

• Remote node replies with ACK/FAIL_ACK when message has been processed

• Allows for message process feedback

Page 19: Scalable Group Communication In Heterogeneous Cluster

19

Feature: Per message delivery

semantics• Most unique feature, what makes Tribes

really stand out• Allows for each message to be delivered

differently• Per message guarantee level• Sync vs. async• Not ordered, ordered, totally ordered

• 27 flags - 2ⁿ (n=27) combinations• Based on interceptors configured

• Each message with its own uniquedelivery guarantee

Page 20: Scalable Group Communication In Heterogeneous Cluster

20

Feature: Pluggable Interceptors

• React on message attributes (flags)• If not modifying message bytes, can

be inserted run time• Intercept any events through defined

methods• ChannelInterceptorBase available to

minimize redundant code for non intercepted methods

Page 21: Scalable Group Communication In Heterogeneous Cluster

21

Feature: Delivery Feedback• Tribes aims to deliver feedback for

each message and each delivery semantic

• NO_ACK, ACK, SYNC_ACK• Synchronous and asynchronous delivery

• Asynchronous gets feedback through callback

• Example, recoverable transactions can now be implemented since we always know if the remote node received the message

Page 22: Scalable Group Communication In Heterogeneous Cluster

22

Feature: Concurrent & Parallel Delivery

• Concurrent• More than one message sent or received a

any point in time• No “message blocking” ie 10mb message

with SYNC_ACK will not stop 10kb NO_ACK• Parallel

• Able to send a message to multiple destinations in parallel using one thread (NIO)

• Prioritized• Future feature

Page 23: Scalable Group Communication In Heterogeneous Cluster

23

Feature:Fixed Node Hierarchy

• Absolute Order Algorithm• Always be able to determine leadership

• No message exchanges (chat free)• Non coordinated

• Also provides “Coordination” algorithm• Chatty, but efficient• Auto merge groups• Enhance node discovery where multicast might glitch• Can connect different subnets when used together

with the StaticMembershipInterceptor

Page 24: Scalable Group Communication In Heterogeneous Cluster

24

Feature:Absolute Failure Detection

• Simple interceptor TcpFailureDetector• Instant feedback on member down

• No need to wait for timeout• No risk of node pings getting stuck on a busy

network• Verifies timeouts against “false positives”• 3 levels

• Connect• Send• Read

Page 25: Scalable Group Communication In Heterogeneous Cluster

25

Feature RPC messaging• Ability to collect responses to a

message• NO_REPLY, FIRST_REPLY,

MAJORITY_REPLY & ALL_REPLY• Absence reply(!) – rather than timeout• Callback left over delivery• Support for multiple RPC channels on

top of one Tribes channel

Page 26: Scalable Group Communication In Heterogeneous Cluster

26

Feature – JNDI Channel• Ability to bind a channel into a JNDI tree• Share the channel between objects• Ideal for J2EE messaging• Coming soon:

• Ability to download client stub• Out of process invocation

• Not yet implemented…

Page 27: Scalable Group Communication In Heterogeneous Cluster

27

Architecture - Overview

Channel

RpcChannel

Application Application Application Application

Tipi Tipi

Interceptor

Interceptor

Coordinator

Membership Sender Receiver

RpcChannel

RX

TX

Page 28: Scalable Group Communication In Heterogeneous Cluster

28

Architecture - Channel• 1 instance per Tribes runtime setup• Is the first interceptor• Holds a list of one or more

ChannelListeners & MembershipListeners

• Serializes and deserializes messages• Supports ByteMessage for transfer of

pure byte[] data• RpcChannel instanceof ChannelListener

Page 29: Scalable Group Communication In Heterogeneous Cluster

29

Architecture - Interceptors• Linked list invocation• Strongly typed – one method per event• No events need to travel through the stack

to coordinate interceptors• Examples

• Failure detection• Static membership• Total order or per member order• Throughput measurements and statistics• Leadership election• Message data encryption• Message dispatch – asynchronous messaging• All or none delivery guarantee

Page 30: Scalable Group Communication In Heterogeneous Cluster

30

Architecture - Interceptors• Trigger on ChannelData.getOptions() • Pass through a ChannelData object• Using XByteBuffer – optimized byte[]

handling• Membership & Message interceptions• Threadless

Page 31: Scalable Group Communication In Heterogeneous Cluster

31

Architecture - Coordinator• Last interceptor• Coordinates IO components

• Sender• Receiver• Membership

• Receiver uses thread pool• Sender piggy backs on application

thread

Page 32: Scalable Group Communication In Heterogeneous Cluster

32

Code Structure• org.apache.catalina.tribes

• Application and Component interfaces• group – default implementation• transport – RX/TX components• membership – membership service• group.interceptors – supplied interceptors• io – protocol utilities and optimizations• tipis – utilities on top of Tribes core

Page 33: Scalable Group Communication In Heterogeneous Cluster

33

Quick StartChannel myChannel = new GroupChannel();

ChannelListener msgListener = new MyMessageListener();MembershipListener mbrListener = new MyMemberListener();

myChannel.addMembershipListener(mbrListener);myChannel.addChannelListener(msgListener);

myChannel.start(Channel.DEFAULT); //start the channel

Serializable myMsg = new MyMessage();

Member[] group = myChannel.getMembers();

channel.send(group,myMsg,Channel.SEND_OPTIONS_DEFAULT);

Page 34: Scalable Group Communication In Heterogeneous Cluster

34

Data Replication• ReplicatedMap – one to all replication• LazyReplicatedMap – primary/backup

replication• Cookie based replication map

• ideal for HTTP session replication• Backup location stored in cookies

• Versioned delta replication• Example: org.apache.catalina.ha

Page 35: Scalable Group Communication In Heterogeneous Cluster

35

Tribes Demos• Demo• Code Example• Discussion around common problems

and how Tribes could solve them

Page 36: Scalable Group Communication In Heterogeneous Cluster

36

Future Work• Security - SSL Support and node

authentication• Many processes – one channel • Language independent • WAN membership discover• TCP Based multicaster for large clusters

• 2*n packet reduction for the sender, not total• Intelligent membership broadcasting

• httpd as a load balancer

Page 37: Scalable Group Communication In Heterogeneous Cluster

37

Q & A• [email protected]• http://people.apache.org/~fhanik/trib

es• Tomcat SVN repository• Interested to use?• Interested to help?

Page 38: Scalable Group Communication In Heterogeneous Cluster

38

Folientitel• Font: Trebuchet MS, 32 pt

•Font: Trebuchet MS, 28 pt•Font: Trebuchet MS, 24 pt

• Font: Trebuchet MS, 20 pt• Lorem ipsum dolor sit amet, consectetur

adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat.

Page 39: Scalable Group Communication In Heterogeneous Cluster

39

FolientitelLorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct. Nam liber a tempor cum soluta nobis eligend optio comque nihil quod a impedit anim id quod maxim placeat.

Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct.