Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI...
-
Upload
justin-harrington -
Category
Documents
-
view
231 -
download
0
Transcript of Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI...
Program Systems Institute RASTDBTDB
TDTDB:B:
THE INTERACTIVE THE INTERACTIVE DISTRIBUTED DEBUGGING DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI TOOL FOR PARALLEL MPI
PROGRAMSPROGRAMS
Program Systems Institute RASAuthors:Authors:
RCMS PSI RASRCMS PSI RAS, , Pereslavl-ZalesskyPereslavl-Zalessky, , RussiaRussia
• A.A. AdamovichAdamovich• M.M. KovalenkoKovalenko
Program Systems Institute RAS
History of the DevelopmentHistory of the Development
T-systemT-system RCMS PSI RASRCMS PSI RAS,, since the early since the early 9090ss
The SKIF project of the Russia-The SKIF project of the Russia-Belarus Union StateBelarus Union State 2000-2004 2000-2004
T-system and itsT-system and its environmentenvironment::• T-systemT-system ( (industrial versionindustrial version););• the TGCC compilerthe TGCC compiler;;• the TDB interactive debugging systemthe TDB interactive debugging system;;• and others.and others.
Program Systems Institute RAS
Objectives of the DevelopmentObjectives of the Development
Support of software design and Support of software design and development using computing development using computing systems of the SKIF familysystems of the SKIF family• the element of the integrated toolkitthe element of the integrated toolkit;;• directed towards T-system supportdirected towards T-system support..
Cost-effectivenessCost-effectiveness• reduced expenses for purchasing and reduced expenses for purchasing and
maintaining the SKIF computing systemmaintaining the SKIF computing system Information independenceInformation independence
Program Systems Institute RAS
Predecessors and AnaloguesPredecessors and Analogues
P2D2P2D2 (Portable Debugger for (Portable Debugger for Parallel and Distributed ProgramsParallel and Distributed Programs, , NASA, 1994, Doreen Cheng, Robert NASA, 1994, Doreen Cheng, Robert Hood)Hood)
TotalViewTotalView (Etnus) (Etnus)
DDTDDT (Distributed Debugging Tool, (Distributed Debugging Tool, Streamline Computing)Streamline Computing)
Program Systems Institute RAS
Basic Architecture PrinciplesBasic Architecture Principles
The TDB architectureThe TDB architecture::
distributed and multi-distributed and multi-componentcomponent
open and portableopen and portable flexibleflexible multi-user multi-user
Program Systems Institute RASThe TDB Architecture:The TDB Architecture:
Distributed and Multi-componentDistributed and Multi-component
1) The primary daemon1) The primary daemon22) ) The secondary The secondary
daemondaemon33) ) The central serverThe central server44) ) The client The client
componentcomponent55) ) The debugging The debugging
serverserver
Program Systems Institute RAS
The TDB Architecture (2/2)The TDB Architecture (2/2)
FlexibleFlexible
uses free software:uses free software:• АСЕ, libxml++АСЕ, libxml++, , libpcrelibpcre, , libgtk2.libgtk2.x, x,
scintillascintilla, , gnome-debug-tdbgnome-debug-tdb (based (based onon gnome-debug) gnome-debug)
the possibility of using the possibility of using commercial products, system commercial products, system debuggers, for exampledebuggers, for example
Program Systems Institute RASTBD FeaturesTBD Features
Debug C and C++, Fortran programsDebug C and C++, Fortran programs Linux for 32-bit or 64-bit processorsLinux for 32-bit or 64-bit processors Debug parallel MPI programs.Debug parallel MPI programs. Supported MPI implementations: LAM, Supported MPI implementations: LAM,
MPICH, SCAMPI, MP-MPICH, DMPI.MPICH, SCAMPI, MP-MPICH, DMPI. Advanced job launch methodsAdvanced job launch methods Monitoring of states of target nodes Monitoring of states of target nodes Multi-user supportMulti-user support
Program Systems Institute RASTBD FeaturesTBD Features
One-touch breakpoint One-touch breakpoint setting/manipulatingsetting/manipulating
Step into, over or out of functionsStep into, over or out of functions WatchpointsWatchpoints One-touch symbolic displayOne-touch symbolic display Controls processes individually or Controls processes individually or
collectivelycollectively Color-coded processes/nodes statesColor-coded processes/nodes states Log filesLog files
Program Systems Institute RASTBD FeaturesTBD Features
GroupsGroups Group processes using flexible definition Group processes using flexible definition
languagelanguage Two types of groups supported:Two types of groups supported:
static groups and static groups and dynamic groupsdynamic groups
Control grouped processes as lone processes Control grouped processes as lone processes (step, next, stop...) with real-time visual (step, next, stop...) with real-time visual feedbackfeedback
Special group commands: Special group commands: group breakpoint, group breakpoint, group displaygroup display
Program Systems Institute RASTBD FeaturesTBD Features
Two process control modes:Two process control modes: active process control modeactive process control mode group control modegroup control mode
Two GTDB operational modes:Two GTDB operational modes: active process / active group debugging active process / active group debugging
modemode per process debugging modeper process debugging mode
Program Systems Institute RASTBD FeaturesTBD Features
Special support for parallelizing Special support for parallelizing systems:systems: T-system support:T-system support:
Special commands t-break, t-print…Special commands t-break, t-print…
Program Systems Institute RAS
GTDB (TDB GUI client) windows and GTDB (TDB GUI client) windows and components featurescomponents features
Main window:Main window: Active Process window Active Process window Source Code display with breakpointsSource Code display with breakpoints Command buttonsCommand buttons Command componentCommand component Active process / Active group selection Active process / Active group selection
componentcomponent
Program Systems Institute RASGTDB windows and components GTDB windows and components
featuresfeatures
GUI component for per process debugging:GUI component for per process debugging: With GUI features for easy processes and MPI-With GUI features for easy processes and MPI-
nodes status readnodes status read With ability to pick and choose one of processesWith ability to pick and choose one of processes Full featured subcomponent for processes Full featured subcomponent for processes
debugging similar to main subcomponent for debugging similar to main subcomponent for debugging active processdebugging active process
MPI-nodes/processes states window, also MPI-nodes/processes states window, also used for selecting processes to inspectused for selecting processes to inspect
Program Systems Institute RASGTDB windows and components GTDB windows and components
featuresfeatures
Breakpoints manipulation component Breakpoints manipulation component windowwindow
Configuration / Properties component Configuration / Properties component windowwindow
Various pop-up menus used for: Various pop-up menus used for: selected expression data inspection and selected expression data inspection and
manipulation, print, display, watchpoints, manipulation, print, display, watchpoints, value set...value set...
execution control (breakpoints set, disable, execution control (breakpoints set, disable, delete...)delete...)
Program Systems Institute RAS
GTDB – TDB Client ComponentGTDB – TDB Client Component
intuitive interface and ergonomic design
the presentation of information is handy and convenient
Program Systems Institute RAS
GTDB Node Selection ComponentGTDB Node Selection Component
User can select the exact set of User can select the exact set of computational nodes that computational nodes that are available for debugging are available for debugging MPI tasks.MPI tasks.
The list of all nodes available The list of all nodes available for MPI task debugging can for MPI task debugging can be obtained through the be obtained through the request to TDB daemons. request to TDB daemons.
The primary TDB daemon is The primary TDB daemon is running on front-end and running on front-end and Secondary TDB daemons are Secondary TDB daemons are running on computational running on computational nodes of cluster. TDB nodes of cluster. TDB daemons represent monitor daemons represent monitor processes.processes.
Secondary daemons collect and Secondary daemons collect and the primary daemon the primary daemon accumulates useful info accumulates useful info about computational nodes about computational nodes status. status.
Program Systems Institute RAS
GTDB Properties ComponentGTDB Properties Component
Is used to configure various TDB, GTDB, Is used to configure various TDB, GTDB, and MPI implementations settingsand MPI implementations settings
Program Systems Institute RAS
GTDB Nodes Status ComponentGTDB Nodes Status Component Describes statuses of MPI-nodes processes.
• Green color marks running processes
• Yellow color marks stopped processes
• Red color marks processes that have been stopped or terminated by a signal
Upper bar : common MPI-node statusGreen - all processes of the node are runningYellow – at least one of the processes is stoppedRed - at least one process caught a signal
Common status bar is used in purpose to give the user the opportunity to read information about the situation with debugging processes in a more simple and clear way.All status subcomponents are implemented as button widgets:if clicked, open appropriate process (processes) for individual exploration in the PROCS GTDB mode.
Program Systems Institute RAS
GTDB Breakpoints ComponentGTDB Breakpoints Component The component The component
is used to is used to work with work with various types various types of breakpoints of breakpoints supported in supported in TDB: TDB:
Source line Source line breakpoints, breakpoints,
function function breakpoints breakpoints andand
watchpoints; watchpoints; all of them may all of them may
have have conditions.conditions.
As well a special type of breakpoints is implemented in TDB, so called As well a special type of breakpoints is implemented in TDB, so called “group breakpoints”. The group breakpoint allows user to set a number “group breakpoints”. The group breakpoint allows user to set a number of uniform breakpoints in a group of parallel processes. The user can of uniform breakpoints in a group of parallel processes. The user can set, delete, disable or enable group breakpoint in one command or click.set, delete, disable or enable group breakpoint in one command or click.
Program Systems Institute RASThe Main GTDB Window. The Main GTDB Window.
Sample Debug SessionSample Debug Session
GTDB in the MAIN -> PROC mode. Process 2:0 is an active (selected, exploring) GTDB in the MAIN -> PROC mode. Process 2:0 is an active (selected, exploring) process...process...
Program Systems Institute RASExample Debug Session of Example Debug Session of
Debugging Simple MPI ProgramDebugging Simple MPI Program
Example of dynamic groups definition using the "dgroup" commandExample of dynamic groups definition using the "dgroup" command
Program Systems Institute RAS
Example Debug Session of Example Debug Session of Debugging Simple MPI ProgramDebugging Simple MPI Program
We continue the execution of processes from the masters dynamic group and then We continue the execution of processes from the masters dynamic group and then stop on previously set breakpoints in the loopstop on previously set breakpoints in the loop..
Program Systems Institute RAS
Example Debug Session of Example Debug Session of Debugging Simple MPI ProgramDebugging Simple MPI Program
As we can see the ‘i’ variable equals to zero on all processes in the masters As we can see the ‘i’ variable equals to zero on all processes in the masters group (the "print" command on group masters was used). To get out from the group (the "print" command on group masters was used). To get out from the
loop we set the ‘i’ variable on all masters to 1.loop we set the ‘i’ variable on all masters to 1.
Program Systems Institute RAS
We continue execution of masters group processes, but – after the loop We continue execution of masters group processes, but – after the loop – execution is stopped by the SIGSEGV signal.– execution is stopped by the SIGSEGV signal.
Program Systems Institute RAS
Per Procs GTDB Debugging ModePer Procs GTDB Debugging Mode In the Main mode the user can In the Main mode the user can
work with one selected work with one selected (active) process or group(active) process or group
In the Procs mode he/she can In the Procs mode he/she can examine any process examine any process individually.individually.
The component was The component was implemented as two implemented as two “notebooks” inserted one into “notebooks” inserted one into the other.the other.
The first (outer, placed The first (outer, placed vertically) notebook is the vertically) notebook is the MPI-nodes notebook. Its MPI-nodes notebook. Its bookmarks contain info about bookmarks contain info about appropriate processes and appropriate processes and common MPI-node statuses, common MPI-node statuses, colored as nodes status colored as nodes status component.component.
The second (inner, placed The second (inner, placed horizontally) notebook is a horizontally) notebook is a notebook of processes...notebook of processes...
Program Systems Institute RASContactsContacts
Max KovalenkoMax Kovalenko [email protected]@botik.ru Alexei AdamovichAlexei Adamovich [email protected]@botik.ru Sergei AbramovSergei Abramov
[email protected]@botik.ru