Yoshihiro Oyama, Kenjiro Taura, Toshio Endo, Akinori Yonezawa
Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken...
-
Upload
scarlett-howard -
Category
Documents
-
view
259 -
download
0
Transcript of Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken...
![Page 1: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/1.jpg)
gluepy:A Simple Distributed Python
Programming Framework for Complex Grid Environments
8/1/08Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro TauraThe University of Tokyo
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 2: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/2.jpg)
Barriers of Grid Environments
• Grid = Multiple Clusters (LAN/WAN)• Complex environment– Dynamic node joins– Resource removal/failure• Network and nodes
– Connectivity• NAT/firewall
leave
join
Fire Wall
Grid enabled frameworks are crucial to facilitate computing in these environments
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 3: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/3.jpg)
What type of applications?
• Typical Usage– Standalone jobs– No interaction among nodes
• Parallel and distributed Applications– Orchestrate nodes for a single application
• Map an existing application on the Grid– Requires complex interaction⇒ frameworks must make it
simple and manageable
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 4: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/4.jpg)
Common Approaches(1)
• Programming-less– Batch Scheduler• Task placement (inter-cluster)• Transparent retries on failure
– Enables minimal interaction• Pass data via files/raw sockets• Embarrassingly parallel tasks• Very limited for application
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
execute
redo
SUBMIT
![Page 5: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/5.jpg)
Common Approaches(2)• Incorporate some user programming– e.g. : Master-Worker framework
• Program the master/worker(s)– Job distribution– Handling worker join/leave– Error handling
• Enables simple interaction– Still limited in application
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
error()
join()
doJob()
For more complex interaction (larger problem set)must allow more flexible/general programming
![Page 6: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/6.jpg)
The most flexible approach
• Parallel Programming Languages– Extend existing languages: retains flexibility– Countless past examples
• (MultiLisp[Halstead ‘85], JavaRMI, ProActive[Huet et al. ‘04], …)
– Problem : not in context of the Grid• Node joins/leaves?• Resolve connectivity with NAT/firewall?
– Coding becomes complex/overwhelming
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Can we not complement this?
![Page 7: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/7.jpg)
Our Contribution
• Grid-enabled distributed object-oriented framework– a focus on coping with complex environment• Joins, failures, connectivity
– simple Programming & minimal Configuration• Simple tool to act as a glue for the Grid
– Implemented parallel applications on Grid environment with 900 cores (9 clusters)
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 8: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/8.jpg)
Agenda
• Introduction• Related Work• Proposal• Evaluation• Conclusion
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 9: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/9.jpg)
Programming-less frameworks• Condor/DAGMan [Thain et al. ‘05]– Batch scheduler– Transparent retires/ handle multiple clusters– Extremely limited interaction among nodes
• Tasks with DAG dependencies• Pass on data using intermediate/scratch files
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Central Manager
Busy Nodes
Assign
Cluster
Interaction using files
Task
![Page 10: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/10.jpg)
“Restricted” Programming frameworks• Master-Worker Model: Jojo2 [Aoki et al. ‘06], OmniRPC [Sato et al. ‘01],
Ninf-C [Nakata et al. ‘04], NetSolve [Casanova et al. ‘96]– Event driven master code: handle join/leave
• Map-Reduce [Dean et al. ‘05]– define 2 functions: map(), reduce()– Partial retires when nodes fail
• Ibis – Satin [Wrzesinska et al. ‘06]– Distributed divide-and-conquer– Random work stealing: accommodate join/leave
• Effective for specialized problem sets– Specialize on a problem/model, made mapping/programming easy– For “unexpected models”, users have to resort to out-of-band/Ad-hoc means
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Map()
Map()
Map()
Reduce()
Reduce()Input Data
Join
Failure Handler
Join Handler
divide
fib(n)
fib(n-1)
![Page 11: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/11.jpg)
Distributed Object Oriented frameworks
• ABCL [Yonezawa ‘90]JavaRMI, Manta [Maassen et al. ‘99]ProActive [Huet et al. ‘04]
• Distributed Object oriented– Disperse objects among resources
• Load delegation/distribution– Method invocations– RMI (Remote Method Invocation)– Async. RMIs for parallelism
• RMI: – good abstraction
• Extension of general language:– Allow flexible coding
foo.doJob(args)
RMI
compute
foo
Async. RMI
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 12: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/12.jpg)
Hurdles for DOO on the Grid• Race conditions
– Simultaneous RMIs on 1 object– Active Objects
• 1 object = 1 thread• Deadlocks:
e.g.: recursive calls
• Handling asynchronous events– e.g., handling node joins– Why not event driven?
• The flow of the program is segmented, and hard to flow
• Handling joins/failures– Difficult to handle them
transparently in a reasonable manner
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
b.f()
a.g()a
bdeadlock
event
if join: addif done: give more…
failure
Checkpoint?Automatic retry?…
![Page 13: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/13.jpg)
Hurdles for Implementation• Connecivity with NAT/firewall
– Solution: Build an overlay
• Existing implementations– ProActive [Huet et al. ‘04]
• Tree topology overlay• User must hand write connectable
points
– Jojo2 [Aoki et al. ‘06]• 2-level Hierarchical topology
– SSH / UDP broadcast• assumes network topology/setting
– out of user control
• Requirements– Minimal user burden
NAT
Firewall
Configure each link
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
ConnectionConfiguration File
![Page 14: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/14.jpg)
Summarization of the Problems
• Distributed Object-Oriented on the Grid– Thread race conditions– Event handling– Node join/leave– underlying Connectivity
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 15: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/15.jpg)
Proposal : gluepy• Grid enabled distributed object oriented framework
– As a Python Library– glue together Grid resources via simple and flexible coding
• Resolve the issues in an object-oriented paradigm– SerialObjects
• define “ownership” for objects• blocking operations unblock on events
– Constructs for handling Node join/leave• Resolve the “first reference” problem• Failures are abstracted as exceptions
– Connectivity (NAT/firewall)• Peers automatically construct an overlay
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 16: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/16.jpg)
The Basic Programming Model• RemoteObjects
– Created/mapped to a process– Accessible from other processes
(RMI)– Passive Objects
• Threads are not bound to objects
• Thread– Simply to gain parallelism– RMIs / async. invocations (RMIs)
implicitly spawn a thread
• Future– Returned for async. invocation– placeholder for result– Uncaught exception is stored
and re-raised at collection
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
a
f()
Proc: A Proc: B
a.f() Spawn for
RMI
a
f()
Proc
Spawn for async
store in F
F = a.f() async
![Page 17: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/17.jpg)
Programming in gluepy• Basics: RemoteObject
– Inherit Base class– Externally referenceable
• Async. invocation with futures– No explicit threads– Easier to maintain
sequential flow
• mutual exclusion? events? ⇒ SerialObjects
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class Peer(RemoteObject): def run(self, arg): # work here… return result
futures = [] for p in peers: f = p.run.future(arg) futures.append(f) waitall(futures)
for f in futures: print f.get()
async. RMI run() on all
wait for all results
read for all results
inherit Remote Object
![Page 18: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/18.jpg)
“ownership” with SerialObjects• SerialObjects
– Objects with mutual exclusion– RemoteObject sub-class
• No explicit locks• Ownership for each object
– call ⇒ acquire– return ⇒ release– Method execution by only 1 thread
• The “owner thread”
• Owner releases ownership onblocking operations– e.g: waitall(), RMI to other SerialObject– Pending threads contest for ownership– Arbitrary thread is scheduled– Eliminate deadlocks for recursive calls
Th
Th Th Th
object
newownerthread
Give-upOwner
ship
block
Th Th Th Th
object
unblock
re-contestfor ownership
ThTh Th Th
object ownerthread
waiting threads
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 19: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/19.jpg)
Signals to SerialObjects• We don’t want event-driven loops!
• Events → “signals”– Blocking op. unblock on signal
• Signals to objects– Unblock a thread blocking
in object’s context• If none, unblock a next blocking thread
– Unblocked thread can handlethe signal(event)
Th
object
unblock
SIGNAL
Th
object handle
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 20: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/20.jpg)
SerialObjects in gluepy• e.g. : A Queue
– pop() • blocks on empty Queue
– add()• call signal() to unblock
waiter
• Atomic Section:– Between blocking ops
in a method– Can update obj. attr.s
and do invocation on Non-Serial Objects
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class DistQueue(SerialObject): def __init__(self): self.queue = []
def add(self, x): self.queue.append(x) if len(self.queue) == 1: self.signal()
def pop(self): while len(self.queue) == 0:
wait([])
x = self.queue.pop(0) return x
Atomic Section
Signal & wake
Block until signal
![Page 21: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/21.jpg)
Managing dynamic resources• Node Join:
– Python process starts• Node leave:
– Process termination
• Constructs for node joins/leaves– Node Join
⇒ “first reference” problemObject lookup
• obtain ref. to existing objects in computation
– Node Leave ⇒ RMI exception
• Catch to handle failure
Exception!
Objects in computation
joining node
lookup
Object on failed node
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 22: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/22.jpg)
e.g.:Master-worker in gluepy (1/3)
• Handles join/leave
• code for join :– join will invoke signal– signal will unblock main
master thread
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class Master(SerialObject): ...
def nodeJoin(self, node): self.nodes.append(node) self.signal()
def run (self): assigned = {} while True: while len(self.nodes)>0 and
len(self.jobs)>0: ASYNC. RMIS TO IDLE WORKERS
readys = wait(futures) if readys == None: continue
for f in readys: HANDLE RESULTS
Signal for join
Block &Handle join
![Page 23: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/23.jpg)
e.g. : Master-worker in gluepy (2/3)
• Failure handling– Exception on collection– Handle exception to
resubmit task
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job)
Failurehandling
![Page 24: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/24.jpg)
e.g.: Master-worker in gluepy (3/3)
• Deployment– Master exports object– Workers get reference
and do RMI to join
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
worker = Worker()
master = RemoteRef(“master”)
master.nodeJoin(worker)
while True: sleep(1)
master = Master()
master.register(“master”)
master.run()
lookup on join
Worker init
Master init
![Page 25: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/25.jpg)
Automatic Overlay Construction(1)• Solution for Connectivity
– Automatically construct an overlay
• TCP overlay– On boot, acquire other
peer info.– Each node connects to a
small number of peers– Establish a connected
connection graph
NAT
Firewall
Global IP
Attempt connection
established connections
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 26: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/26.jpg)
Automatic Overlay Construction(2)
• Firewalled clusters– Automatic
port-forwarding– User configure SSH info
• Transparent routing– P-P communication is
routed– (AODV [Perkins ‘97])
SSH
Firewalltraversal
P-to-Pcommunication
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
#config fileuse src_pat dst_pat, prot=ssh, user=kenny
![Page 27: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/27.jpg)
RMI failure detection on Overlay• Problem with overlay
– A route consists of a number of connections
• RMI failure ⇒ failure of any intermediateconnection
• Path Pointers– Recorded on each forwarding
node– RMI reply returns the path it
came• Failure of intermediate
connection– The preceding forwarding node
back-propagates the failure
Path pointer
RMI handler
failure
Backpropagate
RMI invoker
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 28: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/28.jpg)
Agenda
• Introduction• Related Work• Proposal• Evaluation• Conclusion
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 29: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/29.jpg)
Experimental Environment
hongo:98
chiba:186
okubo:28
suzuk:72
imade:60kototoi:88
kyoto:70
istbs:316
tsubame:64
FirewallPrivate IPs
All packets droppedhiro:88
mirai:48
Global IPs
Max. scale : 9 clusters, over 900 cores
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
InTrigger Grid
Platform in Japan
InTrigger
requires SSH forwarding
![Page 30: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/30.jpg)
Necessary Configuration
• Configuration necessary for Overlay– 2 clusters( tsubame, istbs) require SSH-
portforwarding to other clusters ⇒ 2 lines of configuration
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
# istbs cluster uses SSH for inter-cluster conn.use 133\.11\.23\. (?!133\.11\.23\.), prot=ssh, user=kenny
# tsubame cluster gateway uses SSH for inter-cluster conn.use 131.112.3.1 (?!172\.17\.), prot=ssh, user=kenny
add connection instruction by regular expression
![Page 31: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/31.jpg)
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections
per peer– 1000 trials per each cluster/attempted connection configuration
28 Global/ 238 Private Peers Case: 95 %
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 32: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/32.jpg)
Dynamic Master-Worker• Master object distributes work to Worker objects
– 10,000 tasks as RMI
• Workers repeat join/leave– Tasks for failed nodes are redistributed– No tasks were lost during the experiment
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 33: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/33.jpg)
A Real-life Application
• A combination optimization problem– Permutation Flow Shop Problem– parallel branch-and-bound• Master-Worker like• Requires periodic exchange of bounds
– Code• 250 lines of Python code as glue code• Worker node starts up sequential C++ code
– Communicate with local Python through pipes
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 34: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/34.jpg)
Master-Worker interaction
• Master does RMI to worker– Worker: periodical RMI to master– Not your typical master-worker– requires a flexible framework like ours
Master
Worker
doJob() exchange_bound()
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 35: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/35.jpg)
Performance
• Work Rate– ci : total comp. time per core– N: num. of cores– T: completion time
• Slight drop with 950 cores– due to master node
becoming overloaded
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 36: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/36.jpg)
Troubleshoot Search Engine• Ever stuck debugging, or troubleshooting?
• Re-rank query results obtained from google– Use results from machine learning web-forums– Perform natural language processing on page contents
at query time• Use a Grid backend
– Computationally intensive– Require good response time
• in 10s of seconds
backend
Search Engine
Query:“vmware kernel panic”
Compute!!
Compute!!
Compute!!
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 37: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/37.jpg)
Troubleshoot Search Engine Overview
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
PythonCGI
async.doQuery()
doSearch()
async.doWork()
parsing
Graph extractio
n
rescoring
Leveraged sync/async RMIs to seamlessly integrate parallelism into a sequential program.
Merged CGIs with Grid backend
![Page 38: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/38.jpg)
Agenda
• Introduction• Related Work• Proposal• Evaluation• Conclusion
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 39: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/39.jpg)
Conclusion• gluepy: Grid enabled distributed object oriented framework
– Supports simple and flexible coding for complex Grid• SerialObjects• Signal semantics• Object lookup / exception on RMI failure• Automatic overlay construction
– as a tool to glue together Grid resources simply and flexibly
• Implemented and evaluated applications on the Grid– Max. scale: 900 core (9 cluster)
• NAT/Firewall, with runtime joins/leaves– Parallelized real-life applications
• Take full advantage of gluepy constructs for seamless programming
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 40: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/40.jpg)
Questions?
• gluepy is available from its homepagewww.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 41: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/41.jpg)
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 42: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/42.jpg)
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections
per peer– 1000 trials per each cluster/attempted connection configuration
28 Global/ 238 Private Peers Case: 95 %
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 43: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/43.jpg)
Troubleshoot Search Engine Overview
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
RMIfromCGI
extraction
parsing
Graph extractio
n
rescoring
asynchronously return to CGI
Leverage async. RMI from CGI script to work distribution on the Grid
All coding done in Python seamlessly, using gluepy
async.RMI forquery
async.RMI to
workers
![Page 44: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/44.jpg)
Utilization of the Grid• Grid
= Multiple Clusters (LAN/WAN)
• Typical Usage– Many stand-alone jobs in parallel– Little or No interaction among nodes
• Parallel and distributed Computing– Utilize nodes for a single application
• Parallelize an existing application– Requires complex interaction
⇒ Utilize Grid enabled frameworkswww.logos.ic.i.u-tokyo.ac.jp/~kenny/
gluepy2008/8/1
![Page 45: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/45.jpg)
The demands on the Grid
• A framework that realizes flexible/complex interaction on the Grid
• Can we learn anything from parallel languages?
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
leave
join
Fire Wall
apply
App.
![Page 46: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/46.jpg)
Speedup
900 cores でスケールしなくなる
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 47: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/47.jpg)
累積実行時間累積計算時間の拡大が再実行による
無駄が出ていることが分かる
累積計算時間を考慮すると、スピードアップは 169 cores ⇒ 948 cores (5.64 倍 ) で 4.942008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/
gluepy
![Page 48: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/48.jpg)
オーバレイに関するMicro-Benchmarks
• 1 ノードから RMI を発行–ほぼすべてのノードに対して 3hop 以内で到
達• Latency– Overlay 上で no-op の RMI : ping()
• Bandwidth– Overlay 上で大きい引数の RMI : send_data()
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
![Page 49: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/49.jpg)
オーバレイ上の遅延
• 1 ノードから 5 クラスタのノード上のobject へ RMI : ping()– ping で測定した RTT と比較
• overlay, 1 hop = ~1.5 [ms]
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
![Page 50: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/50.jpg)
オーバレイ上のバンド幅
• 引数 (de)serialization overhead 大– Full 1Gbit Ethernet で理想最大値: 40[MB/sec]– iperf 測定値から算出される最大値と比較– overlay で hop をするたびにバンド幅が減少• store-and-forward
Overlay hop 数でクラスタ内でもバンド幅
が変化
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
![Page 51: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/51.jpg)
Master-worker in gluepy
• Full Master-Worker– 並列 RMI
• New RMI for idle workers
– ノードの参加・脱退• Non-event driven
2008/8/1www.logos.ic.i.u-tokyo.ac.jp/~kenny/
gluepy
class Master(SerialObject): def __init__(self, jobs): self.nodes = [] self.jobs = jobs
def nodeJoin(self , node): self.nodes.append(node) self.signal()
def run (self): assigned = {} while True: while len(self.nodes)>0 and len(self.jobs)>0: node = self.nodes.pop() job = self.jobs.pop() f = node.doJob.future(job) assigned[f] = (node, job)
readys = wait(assigned.keys()) if readys == None: continue
for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job)
Failurehandling
Signal for join
Block &Handle join
worker = Worker()
master = RemoteRef(“master”)
master.join(worker)
while True: sleep(1)
master = Master()
master.register(“master”)
master.run()
lookup on join
Worker init
Master init
![Page 52: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/52.jpg)
Grid 上で並列分散計算の需要
• 例: Web アプリケーション• CGI とのインタラクション• タスク分割・負荷分散• 結果集約・ CGI へ結果加工
backend
Application
Publicly accessible
複雑な協調を必要とする
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 53: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/53.jpg)
提案するモデルでのプログラミング
• future で非同期 RMI– 呼び出しに対する
placeholder– いずれ結果が格納される
• 明示的なスレッドは不要– 別スレッドによる
callback などはない
• RMI を処理する側は?2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/
gluepy
# 非同期 RMI 実行f1 = foo1.doJob.future(args)f2 = foo2.doJob.future(args)# 他の計算……# 返り値を取得。 Waitvalue1 = f1.get()value2 = f2.get()
![Page 54: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/54.jpg)
プログラミングモデルが提供するもの
• 柔軟性:既存のオブジェクト指向言語をライブラリ拡張
• 並列分散計算に携わる諸問題を解決– ロックを意識させない
• オブジェクトの所有権を使った implicit な排他制御• block 時に所有権を implicit に委託する:デッドロック予防
– event-driven に陥らないイベント処理• 同期モデルと協調した signal セマンティクス
– 参加・脱退への対応• object lookup : 「最初の参照」問題• 故障に対する例外セマンティクス
• 分散した動的な環境 (Grid) を簡潔に扱えるモデル
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 55: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/55.jpg)
資源間通信と協調の実現
• Condor/DAGMan [Thain et al. ‘05]– バッチスケジューラ– 「タスク」はスクリプトで表現– タスク間の依存関係は DAG
• Ibis / Satin [Wrzesinska et al. ‘06]– 分散環境で分割統治問題– 子タスクに分割– タスクには親子関係
DAG DependencyRelationship
Central Manager
Busy Nodes
Assign
Cluster
Task
- タスク間の協調は中間ファイル媒体- 依存関係のあるタスク間での通信に限定
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
![Page 56: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/56.jpg)
参加・脱退の処理• JoJo2 [Aoki et al. ‘06]
– Master – Worker– Event driven– イベント毎にハンドラ定
義• タスクの終了• 参加• 脱退・故障
Join
Failure Handler
Join Handler
- 同期の問題- イベント・ドリブンは分かりにくい
-提案する同期セマンティクスに非同期イベント通知機構を導入-Event-driven でない記述が可能
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 57: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/57.jpg)
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections
per peer– 1000 trials per each cluster/attempted connection configuration
28 Global/ 238 Private Peers Case: 95 %
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 58: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/58.jpg)
Future Work
• Reliable WAN communication for the Grid overlays– Node failure– Connection failure
• Weakness of WAN connections– Router Policies• close connections after given period
– Obscure kernel bugs with NAT• Connection resets
Faulty link
WAN links are more vulnerable, and failures will occur2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/
gluepy
![Page 59: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/59.jpg)
Some Related Work• Robust Tree Topologies
for Sensor Networks [ England ‘06]– Create spanning tree for
data reduction– Flat tree for high reliability
• Fewest Hops– Tree with short distance for
low power consumption• Shortest Path
⇒ Spanning Tree that merges the two metrics for the best of two worlds
Fewest Hop:High ReliabilityHigh Power Usage
Shortest Path:Low ReliabilityLow Power Usage
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 60: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/60.jpg)
Possible Future Direction
• Our context: Grid computing– communication latency = metric for link reliability
• Fewest Hops– Reliability for node
failure
• Shortest Distance– Reliability for link failure
Short reliable links
Long faulty links
Can we construct an overlay connection topology that take the best of two worlds?
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 61: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/61.jpg)
Publications• 1. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming
Framework for Complex Grid Environments. In 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008 (Poster Paper. To Appear).
• 2. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In IPSJ Transactions on Advanced Computing Systems. (Conditional Accept)
• 3. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In Proceedings of 2008 Symposium on Advanced Computing Systems and Infrastructures. (To Appear)
• 4. Ken Hironaka, Shogo Sawai, Kenjiro Taura. A Distributed Object-Oriented Library for Computation Across Volatile Resources. In Summer United Workshops on Parallel, Distributed and Cooperative Processing. August 2007
• 5. Ken Hironaka, Kenjiro Taura, Takashi Chikayama. A Low-Stretch Object Migration Scheme for Wide-Area Environments. In IPSJ Transactions on Programming. Vol 48 No.SIG 12 (PRO 34), pp.28-40, August 2007
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 62: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/62.jpg)
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 63: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/63.jpg)
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 64: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/64.jpg)
Problems with Grid Computing (2)• Complexity of Programming on the Grid– Low Level Computing (sockets)
• Communication• Multi-threaded Computing (Synchronization)• Heavy Burden on Non-experts
– Flexibility and Integration• Grid Frameworks for task distribution• Independent parallel programming languages• Computing is not execution of many independent tasks
– Need finer grained communication• Bad interface with user application
– Java, Ruby, Python, PHP
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 65: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/65.jpg)
Related Work
• Discussed with respect to criteria necessary for modern Grid computing– Workflow Coordination• Flexibility without putting the burden on the user
– Joining Nodes / Failure of resources• Handling these events should not dominate the
programming overhead– Connectivity in Wide-Area Networks• Adaptation to networks with NAT/firewall with little
manual settings
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 66: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/66.jpg)
Workflow Coordination (1)
• Condor / DAGMan [Thain ‘05]– “Tasks” are expressed as script
files and distributed on idle nodes– Dependency between tasks can be
expressed in DAG (Directed Acyclic Graph)
• Ibis / Satin [Wrzesinska ‘06]– framework for divide-and-conquer
problems– Tasks can be broken into smaller
sub-tasks, on which it depends
DAG Dependency
Relationship
Central Manager
Busy Nodes
Assign
Cluster
Task
- Many computation cannot be expressed as “Tasks” with dependencies- A task’s communication is limited to others to which it has dependencies
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 67: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/67.jpg)
Object Synchronization Example
class A: def __init__(self, x): self.x = x
def f(self, b): self.x += 1
#blocking RMI b.g() self.x -= 1
return
Atomic section
Atomic section
a b
b.g()
Value x stays
consistent
In method f(), instance a invokes blocking method g() on object b
only 1 thread at a
time
give-upownershipduring RMI
block
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 68: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/68.jpg)
Adaptation to Dynamic Resources• Signal delivery to objects
– Unblocks any thread that is blocking in the object’s context• Can be used to notify
asynchronous events– A joining node
• Node Failure ⇒ RMI Failure– Failure returned as exception
to method invocation– The user can catch the
exception, and perform rollback procedures if necessary
exception
Th
object
blockunblock
signal
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 69: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/69.jpg)
Preliminary Experiments
• Overlay Construction Simulation• A Simple Master-Worker Application with
dynamically joining/leaving nodes• A Real-life Parallel Application on the Grid• A Troubleshoot-Search Engine
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 70: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/70.jpg)
hongo(98)
chiba(186)
okubo(28)
suzuk(72)
imade(60)kototoi (88)
kyoto(70)
istbs(316)
tsubame(64)
Global IPs
Firewall
Private IPs
All packets dropped
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 71: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/71.jpg)
hongo(98)
chiba(186)
okubo(28)
suzuk(72)
imade(60)kototoi (88)
kyoto(70)
istbs(316)
tsubame(64)
Global IPs
Firewall
Private IPs
All packets dropped
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
![Page 72: Gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura.](https://reader035.fdocuments.net/reader035/viewer/2022081501/56649ec05503460f94bcc314/html5/thumbnails/72.jpg)
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy