Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf ·...

13
Taupe: Towards Understanding Program Comprehension Yann-Ga¨ el Gu´ eh´ eneuc Ptidej Team – LaiGLE epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal – CP 6128 succ. Centre Ville Montr´ eal, Qu´ ebec, H3C 3J7 – Canada [email protected] Abstract Program comprehension is a very important ac- tivity during the development and the mainte- nance of programs. This activity has been ac- tively studied in the past decades to present software engineers with the most accurate and—hopefully—most useful pieces of infor- mation on the organisation, algorithms, ex- ecutions, evolution, and documentation of a program. Yet, only few work tried to under- stand concretely how software engineers obtain and use this information. Software engineers mainly use sight to obtain information about a program, usually from source code or class diagrams. Therefore, we use eye-tracking to collect data about the use of class diagrams by software engineers during program comprehen- sion. We introduce a new visualisation tech- nique to aggregate and to present the collected data. We also report the results and surprising insights gained from two case studies. 1 Introduction Program comprehension is one of the most im- portant activity during the development and maintenance of programs. This activity is “necessary to facilitate reuse, inspection, main- tenance, reverse engineering, re-engineering, migration, and extension of existing software systems” [11]. It consists, for a software engi- neer, in acquiring information about the organ- Copyright c 2006 Yann-Ga¨ el Gu´ eh´ eneuc. Permis- sion to copy is hereby granted provided the original copyright notice is reproduced in copies made. isation, algorithms, executions, evolution, and documentation of a program to form an accu- rate mental model of this program. This men- tal model allows software engineers to perform appropriate changes without introducing bugs [23] or degrading the program design [18]. Previous work analysed the activity of pro- gram comprehension and suggested techniques to ease this activity. This work includes un- derstanding the creation of mental models [23], identifying interesting information [13], propos- ing new techniques to present information [22]. However, to the best of our knowledge, no pre- vious work studied how software engineers con- cretely acquire information about a program. The acquisition of information is mainly per- formed through sight, from a computer screen. Thus, we propose to study the use of sight by software engineers during program comprehen- sion to better understand this activity. A bet- ter understanding of the acquisition of informa- tion will help in combining existing techniques and in developing new techniques to ease pro- gram comprehension. UML [15] has become de facto the standard notation to describe the organisation, struc- ture, and behaviour of object-oriented pro- grams. In particular, class diagrams describing the organisation of classes, their relationships and their services, are often used by software engineers to obtain information on the design of objet-oriented programs [6, 19, 24]. We use eye-tracking to collect data on the acquisition of information from class diagrams by software engineers through sight to under- stand concretely how software engineers obtain information and use this source of information.

Transcript of Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf ·...

Page 1: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

Taupe: Towards Understanding Program Comprehension

Yann-Gael GueheneucPtidej Team – LaiGLE

Departement d’informatique et de recherche operationnelleUniversite de Montreal – CP 6128 succ. Centre Ville

Montreal, Quebec, H3C 3J7 – [email protected]

Abstract

Program comprehension is a very important ac-tivity during the development and the mainte-nance of programs. This activity has been ac-tively studied in the past decades to presentsoftware engineers with the most accurateand—hopefully—most useful pieces of infor-mation on the organisation, algorithms, ex-ecutions, evolution, and documentation of aprogram. Yet, only few work tried to under-stand concretely how software engineers obtainand use this information. Software engineersmainly use sight to obtain information abouta program, usually from source code or classdiagrams. Therefore, we use eye-tracking tocollect data about the use of class diagrams bysoftware engineers during program comprehen-sion. We introduce a new visualisation tech-nique to aggregate and to present the collecteddata. We also report the results and surprisinginsights gained from two case studies.

1 Introduction

Program comprehension is one of the most im-portant activity during the development andmaintenance of programs. This activity is“necessary to facilitate reuse, inspection, main-tenance, reverse engineering, re-engineering,migration, and extension of existing softwaresystems” [11]. It consists, for a software engi-neer, in acquiring information about the organ-

Copyright c© 2006 Yann-Gael Gueheneuc. Permis-sion to copy is hereby granted provided the originalcopyright notice is reproduced in copies made.

isation, algorithms, executions, evolution, anddocumentation of a program to form an accu-rate mental model of this program. This men-tal model allows software engineers to performappropriate changes without introducing bugs[23] or degrading the program design [18].

Previous work analysed the activity of pro-gram comprehension and suggested techniquesto ease this activity. This work includes un-derstanding the creation of mental models [23],identifying interesting information [13], propos-ing new techniques to present information [22].However, to the best of our knowledge, no pre-vious work studied how software engineers con-cretely acquire information about a program.The acquisition of information is mainly per-formed through sight, from a computer screen.Thus, we propose to study the use of sight bysoftware engineers during program comprehen-sion to better understand this activity. A bet-ter understanding of the acquisition of informa-tion will help in combining existing techniquesand in developing new techniques to ease pro-gram comprehension.

UML [15] has become de facto the standardnotation to describe the organisation, struc-ture, and behaviour of object-oriented pro-grams. In particular, class diagrams describingthe organisation of classes, their relationshipsand their services, are often used by softwareengineers to obtain information on the designof objet-oriented programs [6, 19, 24].

We use eye-tracking to collect data on theacquisition of information from class diagramsby software engineers through sight to under-stand concretely how software engineers obtaininformation and use this source of information.

Page 2: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

(a) Headband and its different parts. (b) Use of the headband.

Figure 1: SR Research EyeLink II eye-tracking system.

The data collected using eye-tracking consistsof two types of particularly useful information:fixations and saccades. Fixations record theposition of the eye during a gaze while saccadesrecord the movements of the eyes between twofixations. The numbers of fixations and sac-cades can vary from a dozen to several hun-dreds depending on the size and complexity ofthe class diagram, on the time spent looking atthe diagram, on the strategy used to acquireinformation from a diagram, and on the pur-pose of the activity of program comprehensionat hand.

No two software engineers analyse a samediagram identically. Therefore, we develop anew visualisation technique to aggregate andto present the fixations and saccades collectedfrom several software engineers. Our techniqueintroduces the concept of areas of interest andsuperimposes aggregations of the fixations andsaccades and a class diagram with alpha com-position to highlight the most visited areas ofthe diagram. We implement our tool as theTaupe data viewer. We perform two case stud-ies with two different class diagrams and pro-gram comprehension activities with softwareengineers and report results using our visuali-sation technique. These case studies show that,surprisingly, binary class relationships (inheri-tance, use, association, aggregation, and com-position [7]) seem to be scarcely used. Thus,our contributions are:

• The first use of eye-trackers to collect dataon the concrete use of class diagrams dur-ing program comprehension.

• A new visualisation technique to aggregateand to present collected data and its im-plementation, the Taupe data viewer.

• Two case studies of the analysis of pro-gram comprehension with class diagramsperformed with software engineers.

Section 2 introduce the idea of using eye-tracking to study the comprehension of classdiagrams. Section 3 present our visualisationtechnique and its implementation, the Taupe

data viewer. Section 4 report the results of twofirst case studies with class diagrams. Finally,Section 5 concludes and presents future work.

2 Class Diagrams and Eye-Tracking

Class diagrams represent the structure andglobal behaviour [10] of programs, showingclasses, interfaces, and their relationships.They are often used by software engineersduring development and maintenance to ab-stract implementation details and to present aneasier-to-grasp clustered view of the programsource code [6, 19, 24].

2.1 Previous Work

Class diagrams have been extensively studiedin the literature on program comprehension.We only summarise here some of the main linesof research on program comprehension usingclass diagrams.

2

Page 3: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

Purchase et al. [1] reported the results ofexperiments on the effect of aesthetics criteriaon the preferences of users for UML class di-agrams. They performed several experimentswith subjects to assess the subjects’ preferencesover several pairs of class diagrams, each dia-gram in a pair conforming to different (but re-lated) aesthetics criteria. They collected quan-titative data in the form of percentages of sub-jects preferring one diagram over another ineach pair and qualitative assessments of eachdiagram. They concluded on the most im-portant aesthetic criteria for UML class dia-grams, including joined inheritance arcs anddirectional indicators.

Eichelberger [3] studied the relation betweenthe UML notation for class diagrams, princi-ples of human–computer interactions, and prin-ciples of object-oriented design and program-ming. The author then suggested changes tothe UML notation and aesthetics criteria tolay out class diagrams. These changes and aes-thetic criteria are implemented in a tool, Sugi-

bib, to lay out UML-like class diagrams. Theauthor claimed that laying out class diagramswhile conforming to the aesthetics criteria im-prove the readability of the diagrams but heonly provides qualitative arguments.

Hadar and Hazzan [8] presented results froma study on the startegies applied by softwareengineers in the process of comprehending vi-sual models of programs. They use visual mod-els that were described using the UML notationand included use case, activity, class, sequence,collaboration, state chart, object, package, anddeployment diagrams. The subjects were se-nior students majoring in computer sciencefrom several universities. The subjects were di-vided in two groups to collect both qualitativeand quantitative data. The authors concludedon the usefulness of multifaceted descriptionsof programs provided by the UML and that noone type of diagram was more important thanthe other. However, further studies should beperformed to confirm these findings.

Sun and Wong [24] evaluated the layout al-gorithms for class diagrams of two industrialtools, Rational Rose and Borland Together, ac-cording to criteria from previous work, includ-ing the cited works by Purchase [1] and Eichel-berger [3]. Using laws from the Gestalt the-

ory of visual perception [16, p. 50–53], theyretained and justified 14 criteria to assess thevisual quality of the layouts of class diagrams.They applied these criteria on a Thermometerprogram and on JUnit [4]. They concluded onthe good quality of both industrial tools, on therelevance of their criteria, and on the difficultyof satisfying all criteria.

Previous work developed and studied criteriaand techniques to ease program comprehensionin general and using class diagrams in partic-ular. Yet, to the best of our knowledge, noprevious work studied the concrete acquisitionof information from class diagrams.

2.2 General Setting

Progress in non-intrusive monitoring of humanbehaviour allows studying the external behav-iour of software engineers involved in programcomprehension activities while disturbing theiractivities as little as possible. In particular,the use of video-based eye tracking systems al-lows recording software engineers’ eye move-ments when they look at a class diagram tounderstand the modelled program.

A video-based eye tracking system collectsdata on the relative coordinates of a sub-ject’s eye movements over a computer screen,through a special headband, without interfer-ing with the subject’s activity, if the subjectkeeps a relatively steady position in front ofthe screen. The system converts the collectedraw coordinates in fixations and saccades usingthresholds based on human physiology.

We use the eye-tracking systems provided bySR Research to study software engineers dur-ing program comprehension. SR Research isan international manufacturer of high qualityeye-tracking systems. It provides the EyeLinkII system, which decomposes in a display com-puter displaying the data to be analysed by asoftware engineer, a host computer storing thedata on an engineer’s eye movements, and aheadband supporting cameras to track the eyemovements.

Figure 1(a) depicts the headband and its dif-ferent parts. The headband mainly consists ofa set of cameras recording the position of thehead and the eye movements with respect tothe image displayed on the computer screen:

3

Page 4: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

Figure 2: Use of a EyeLink II system.

active components are the eye cameras andhead camera, which capture the eye movementsand relate the position of the head with this ofthe computer screen. The other componentsare used to set and hold the cameras in place.Figure 1(b) shows an anonymous subject withsuch a headband.

Figure 2 illustrates the data acquisitionprocess. The headband is connected to thehost computer, which is connected through anEthernet link to the display computer. Syn-chronisation with the image displayed on thescreen of the display computer (being looked atby the software engineer) is perform via a pro-vided API, which allows the display computerto control the host computer to synchronise thecameras and the display screen and to gener-ate the data. The generated data is stored onthe host computer and transferred to the dis-play computer for analysis at the end of eachexperiment. The host computer allows experi-menters to calibrate the cameras and to controlthe experiments.

We use such EyeLink II systems to studysoftware engineers’ eye movements over theconstituents of class diagrams and the dwelltime of fixations on individual constituentssuch as classes, interfaces, and relationships.

2.3 Collected Data

An experiment divides in a set of trials. Atrial contains the data collected while a soft-ware engineer looks at one image on the displaycomputer, an experiment groups many trialsperformed successively with (possibly) differ-ent images.

The data collected during a trial containsraw coordinates and three types of events: fixa-tions, blinks, and saccades. Each type of eventis characterised by the position of the eyes rel-ative to the display screen (two positions forsaccades) and by a timestamp in millisecondsrelative to the beginning of each trial.

The data collected during one of our typi-cal trials ranges between 200 and 5,000 kilo-bytes of data, depending on the duration of theexperiment. It contains between 10,000 and200,000 lines of raw data representing 50 andup to 500 fixations, saccades, and blinks. Fixa-tions, saccades, and blinks are intertwined withthe raw data on the fly, during data collection.

This data is stored in a binary format knownas Eyelink Data Files (EDF). The EDF formatis convenient for quick addition of new data andtreatment, in particular conversion into ASCII.Table 1 describes the skeleton of an EDF file:An EDF file divides in a header and a body;The header contains information about the ex-periments, the trial, calibration, drift correc-tion, and the type of recorded events; The bodystores both the raw data and events such as be-ginnings and ends of fixations or saccades.

3 Visualisation Technique

We must aggregate and present the data col-lected for a same trial across as many exper-iments as possible to build sound general hy-potheses on the acquisition of information bysoftware engineers from class diagrams. Wedevelop a new visualisation technique to aggre-gate data from several trials and to highlightthe parts of a class diagram more or less used bysoftware engineers to acquire information. Themain hypothesis of our visualisation techniqueis that we can equate fixation with attention assuggested for example by Rock [20].

3.1 Previous Work

Previous work attempted to quantify traces ofeye-movements rather than parameters such asmean fixation duration, to identify clusters ofmeaningful fixations, or to manipulate images.

DeCarlo and Santella [2] used eye-trackersto identify the point of attention of a sub-ject on an image and, using this knowledge,

4

Page 5: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

HeaderGeneral information,

including the name of theclass diagram

MSG 82177 TRIALID PIX1 NormalMSG 82178 !V TRIAL VAR DATA NormalMSG 82210 !V IMGLOAD FILL images/UMLDiagram1.jpg

Calibration

MSG 208698 !CAL Calibration points:MSG 208698 !CAL 176.2, 74.5 -808, 3467. . .MSG 230854 !CAL VALIDATION HV9 R RIGHT GOOD ERROR

0.55 avg. 0.97 max OFFSET 0.45 deg. -13.2,5.2 pix.

Drift correction MSG 253038 DRIFTCORRECT R RIGHTat 640,512 OFFSET 0.83 deg. -20.9,14.5 pix.

Information on therecorded data

MSG 253043 !MODE RECORD P 500 2 1START 253044 RIGHT SAMPLES EVENTSPRESCALER 1VPRESCALER 1PUPIL AREAEVENTS GAZE RIGHT RATE 500.00 TRACKING P FILTER 2SAMPLES GAZE RIGHT RATE 500.00 TRACKING P FILTER 2

Body

Eye positions 253048 641.7 512.0 499.0 .. . .

Beginning and end of afixation with in-betweensamples of eye positions

SFIX R 253052. . .253310 643.7 508.0 496.0 .. . .EFIX R 253052 253310 260 641.5 509.1 504

Beginning and end of asaccade with in-betweensamples of eye positions

SSACC R 253312. . .253328 691.9 435.4 522.0 .. . .ESACC R 253312 253346 36 644.0 505.0 706.3 394.3 4.43 260

Following beginnings andends of fixations and

saccades with in-betweensamples of eye positions

SFIX R 253348. . .EFIX R 253348 253610 264 714.8 404.2 452SSACC R 253612. . .

EndEnd of the recording END 267755 SAMPLES EVENTS RES 32.69 29.12

Table 1: Details of the collected data.

5

Page 6: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

to compute a stylised and abstract version ofthe image. Their work falls in the field ofnon-photorealistic rendering. They developeda perceptual model to translate the data gath-ered from an eye-tracker into predictions aboutwhich elements of the image carry informationinteresting to the subject. Then, they proposedan algorithm to stylise and abstract the image,thus reducing the perceptual and cognitive ef-fort required to understand the image by re-moving extraneous details. Their work is espe-cially interesting with photographs.

Santella and DeCarlo [21] proposed a ro-bust clustering technique to group sets of fix-ations either spatially or spatially and tempo-rally. Their technique is robust because its re-sults are not affected by noise and outliers. Itcan be used to locate areas of interest for a sub-ject or subjects and to quantify the interest ineach area. It uses a mean shift procedure whichproceeds deterministically without the need tochoose a number of clusters in advance. Our vi-sualisation technique performs in the oppositeof this technique: We assume the knowledge ofthe areas of interest and we want to allocate fix-ations in these areas (to compute percentages)rather than to cluster the fixations.

Wooding [26] introduced the concept of fix-ation maps. A fixation map is a three-dimensional map in which the two first di-mensions are the coordinates of the fixationsunder study (either from one subject or fromseveral subjects) and the third dimension is avalue describing the discrimination, detection,or perception achievable from some consideredfixations. The third dimension is deliberatelyleft vague by the author. A fixation map isbuilt by computing for each fixation locationthe third dimension, for example using a 3-dimensional gaussian or a cylinder. If morethan one gaussian overlap at the location ofone fixation, their value is added to the thirddimension. The width of the gaussian or cylin-der depends on the size of the area over whicha fixation exists. Using fixation maps, it is thuspossible to aggregate fixations and to highlightthe areas which received most attention.

Previous work proposed analyses and studiedthe data collected using eye-tracking systems.Yet, to the best of our knowledge, no previous

fA fB

d = 1/2 d = 1/2w = 1

A BC

Figure 3: Problem of juxtaposition of fixations(or groups thereof) in fixation maps, where dis a distance and w the weight of a point.

work proposed a technique to display this dataand the amount of attention in areas of interest.

3.2 Areas of Interest

Our technique builds on previous work on clus-tering and fixation maps and introduces theconcept of area of interest. Fixation maps areuseful to show the areas of an image that at-tracted most of the fixations by highlightingthese areas through colour gradation. Yet, itcombines the data from several trial linearlyand, if used to combine several trials, meetsthe problem of juxtaposition among fixations.

The problem of juxtaposition arises whentwo fixations (or group thereof) belong to twodistinct yet close areas of interest. Figure 3shows a typical case of problem of juxtaposi-tion: Although fixation fA pertain to area Aand fixation fB belongs the distinct area B andthe two areas are separated by area C, a fixa-tion map would show a continuum of interest inthe empty area C because each point in area C,such as the black point, is weighted relativelyto the fixations fA and fB , with a weight of 1in the example.

We avoid the problem of juxtaposition whenaggregating and presenting data from morethan one trial through the use of areas of inter-est. We define an area of interest as an area in aclass diagram that the experimenter considersas relevant to program comprehension activ-ity, excluding pieces of the image not belongingto an area of interest. The experimenter maychoose areas of interest before trials or identifythese areas after visual analyses of the clusteredcollected data.

Typically with class diagrams, areas of in-terest are rectangles circumscribing the graphi-cal representations of classes and interfaces andpolygons surrounding relationships (such as in-

6

Page 7: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

Figure 4: Some areas of interest (black rectan-gles) in a subset of a typical class diagram.

heritance, implementation, or association rela-tionships). Figure 4 shows some areas of inter-est in a subset of a class diagram used in ourcase studies (black rectangles around classes).

We use areas of interest to aggregate fixa-tions and saccades from many trial while high-lighting the areas with more or less fixationswith respect to one another. For each areaof interest, we count the number of fixationsand–or saccades in the area across trials andcompute percentages with respect to the over-all number of fixations and–or saccades. Thus,we associate a percentage with each area of in-terest and can show differences among areasand thus highlight which areas received moreof less attention than others without the jux-taposition problem.

The concept of areas of interest is simple andcan be applied to any type of image where ex-perimenters can identify such areas.

3.3 The Taupe Data Viewer

We have implemented our visualisation tech-nique in the Taupe data viewer. The Taupe

data viewer (Thoroughly Analysing the Under-standing of Programs through Eyesight1) is im-plemented 100% in Java. It uses the API pro-vided by SR Research to handle EDF files di-rectly in Java. It is a simple program consistingof 20 classes and 130 methods for 1,500 lines ofcommented codes (excluding comments and li-braries).

Figure 5 shows the user interface of theTaupe data viewer. Part A displays the classdiagram shown to software engineers duringsome trials. Part B is used to load the dataviewer data from experiments, choose the tri-

1Taupe means mole in French.

Figure 5: User interface of the Taupe dataviewer and its three parts.

als of interest in these experiments, show/hidefixations and–or saccades, and include/excludefixations and–or saccades from the computa-tion of the percentages of the areas of interest.Part C is used to load sets of areas of inter-est into the data viewer, to display these areas,and to shift the position of the collected datato compensate any drift during calibration ofthe eye-trackers.

The Taupe data viewer uses alpha compo-sition to display fixations, saccades, and ar-eas of interest over the class diagram withoutcluttering the diagram while differentiating ar-eas which received much attention from thosewhich did not.

Figure 6(b) shows the fixations from severalsoftware engineers and the computed areas ofinterest superimposed over the typical class di-agram used in our first case study and shownin Figure 6(a). Dots represent fixations, someparts of the class diagram are almost invisibleto indicate that they did not receive attentionfrom the software engineers while others (suchas F) received much attention. Figures 6(a)and 6(b) show that, using our data viewer, wecan accurately highlight the parts of a class di-agram that received much attention from soft-ware engineers during program comprehension.

4 Case Studies

We perform two case studies to highlight theuse of eye-tracking systems to understand theprogram comprehension activity better. In

7

Page 8: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

these case studies, we show two typical class di-agram to software engineers and ask, for each, atypical question that could arise during the de-velopment or the maintenance of the describedprograms. We obtain surprising results on theuse of relationships.

4.1 Experimental Setting

We want to understand how software engineersuse concretely class diagrams during programcomprehension activities. Thus, the subjects ofour experiments are software engineers and theobjects are class diagrams representing someprograms. In the perspective of generalisingthe results of our experiments, we must choosethe subjects and objects carefully.

Subjects. Subjects should be representativeof software engineers in general and be suffi-cient in number to account for incidental vari-ations. We use subjects from a convenientsample consisting of graduate students in soft-ware engineering at the Department of Infor-matics and Operations Research at Universityof Montreal to perform the first experimentsreported in the following. Graduate studentshave an reasonable knowledge of UML in gen-eral and class diagrams in particular, with re-spect to software engineers in general. Al-though most do not have industrial experiencein development and maintenance, they all havedesigned and developed small-scale programsusing UML during their studies. We performour experiments with a dozen subjects to lever-age further variations.

Objects. The objects of our experiments areclass diagrams modelling object-oriented pro-grams. We use two different class diagramsfrom two different programs to avoid rein-forcement learning [19, 25] after the first setof trials. The first class diagram, CD1 inFigure 6(a), models a subset of the Ptidej

program [6]. We render this class diagramanonymous to avoid bias as some subjectsparticipated in its development. The sec-ond class diagram, CD2 in Figure 7(a), mod-els a simulation of an ATM machine writ-ten by Russell C. Bjork and freely avail-able at http://www.math-cs.gordon.edu/

courses/cs211/ATMExample/. We use this ex-ample because it provides a complete yet man-ageable class diagram.

Questions. We ask the subjects one questionper class diagram to trigger the program com-prehension activity. The question associated toCD1 concerns the design of the modelled pro-gram and the addition of a new class in theclass hierarchy):

Q1: Where to add a class Other suchthat its instances belong to a tree ofobjects with instances of class F ascontainers?2

The second question related to CD2 also con-cerns design and class hierarchy but involvedchanging the existing class hierarchy:

Q2: How to add a new type ofTransaction that is not associatedwith a Session?2

Recording. We ask subjects to come for theexperiments in sequential order. We monitoredin real-time and continuously the recordings toprevent drift and errors in the measures. Weasked subjects not to communicate on the ex-periments with future subjects. We record thesubjects’ eye movements, including events re-lated to fixations and saccades. We do notrecord events related to blinks because we donot believe they are relevant to understand theprogram comprehension activity. Results ofthe experiments are kept strictly confidentialthrough the automatic generation of the namesof the EDF files and the automatic renaming ofall the files in random order to prevent identi-fication through time stamps. We ensure thatsubjects understood the question and performthe expected program comprehension activityby analysing their answers to the questions. Inour experiments, no subject has been rejectedbecause of misunderstanding.

Results. Figures 6(b) and 7(b) show screen-shots of the Taupe data viewer showing theresults of the experiments for CD1 and CD2,respectively. As expected, the collected data

2The questions are translated from French.

8

Page 9: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

(a) A typical class diagram (CD1).

(b) Results of the analysis of the class diagram with fixations and areas of interest.

Figure 6: Analysis of the typical class diagram CD19

Page 10: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

(a) Class diagram of an ATM machine (CD2).

(b) Fixations on the class diagram without areas of interest.

Figure 7: Analysis of the typical class diagram CD210

Page 11: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

supports the idea that software engineers firstbrowse the class diagrams seemingly randomlyto identify most useful parts and then focuson these parts for the program comprehensionactivity at hand (to answer the questions).

Thus, it is not surprising to observe morefixations towards the classes most useful:

• CD1: Software engineers concentratedtheir attention on class F and relatedclasses, in particular B, D, and E3.

• CD2: Software engineers fixed their atten-tion on class Transaction, its subclasses(such as Withdrawal), and class Session4.

Surprisingly, however, software engineers donot seem to follow binary class relationships,such as inheritance and composition. In none ofthe experiments, either using the Taupe dataviewer to display fixations and saccades or SRResearch default viewer to replay the eye move-ments could we notice software engineers fol-lowing the relationships among classes.

This is a very surprising result, in particularin face of the several techniques existing in theliterature to choose correct relationships amongclasses, such as [9, 14], and to recover binaryclass relationships from source code, such as[7, 10]. We hypothesise that we obtain theseresults because the class diagrams are quitesimple and the questions do not require per sethe use of the binary class relationships. Yet,we expected software engineers to confirm theirfindings using the relationships among classes.

Threats to Validity. The previous resultsare subject to some threats to validity:

• Time thresholds, peripheral vision, andoffsets: it is unlikely that we missed fixa-tions, in particular on the binary class re-lationships, due to the thresholds used toidentify fixations from raw data, to the useof peripheral vision by software engineerswhile focusing on classes, or to measure-ment problems. Yet, further (different) ex-periments are required to confirm our find-ings. In particular, we will study other dia-

3Adding Other as a subclass of B answers Q1.4The hierarchy of Transaction must be modified to

add an extra ManagedTransaction class to answer Q2.

grams with different layouts where entitiesare further apart.

• Subjects: we used a convenient sample ofgraduate students to perform the exper-iments. We must perform again our ex-periments with other subjects to level anyparticularity of our convenient sample.

• Questions: the question used to trigger theprogram comprehension activity are quitesimple and should be further refined tobetter frame the software engineers’ be-haviour.

Despite these threats to validity, the re-ported results are important because they openinteresting research avenues to better under-stand the use of class diagrams during programcomprehension and provide a base on which tobuild further experiments.

5 Conclusion

We applied eye-tracking to a first study of theuse of class diagrams by software engineers dur-ing program comprehension. We report inter-esting results from two case studies on the ap-parent lack of use of the relationships amongclasses. Our first experiments—although theirvalidity should be further confirmed—are onlya beginning in promising research to better un-derstand program comprehension.

This work is part of the on-going Taupe

project being carried out at the Department ofInformatics and Operations Research of Uni-versity of Montreal. It is but a first step to-wards a better understanding of the use of classdiagrams and other software artifacts duringprogram comprehension. Future work includes:

• Performing more case studies with differ-ent and more complex class diagrams andother types of diagram [17], different activ-ities of program comprehension, and withother participants, including software en-gineers from the industry.

• Evaluating the different strategies used bysoftware engineers to acquire informationfrom class diagrams with respect to the ac-tivity of program comprehension at hand.

11

Page 12: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

• Studying other sources of information suchas UML sequence diagrams, source code,adjacency matrices [5], and other visuali-sation techniques [12], such as in [8, 13].

• Formulating hypotheses, based on the casestudies and their results, on the concreteuse of information by software engineersand developing a theory of program com-prehension to help in developing and inevaluating new visualisation techniques.

Acknowledgements

The author thanks gratefully the CanadianFoundation for Innovation for the financial sup-port, SR Research for the their invaluable help,Pierre Poulin for the many interesting discus-sions, and the several participants to the casestudies.

References

[1] Helen C.Purchase, Jo-Anne Allder, andDavid Carrington. Graph layout aesthet-ics in UML diagrams: User preferences.journal of Graph Algorithms and Applica-tions, 6(3):255–279, June 2002.

[2] Doug DeCarlo and Anthony Santella. Styl-ization and abstraction of photographs.In Tom Appolloni, editor, proceedings ofthe 29th conference on Computer graphicsand interactive techniques, pages 769–776.ACM Press, July 2002.

[3] Holger Eichelberger. Nice class diagramsadmit good design? In John T. Stasko,editor, proceedings of the 1st symposiumon Software Visualization, pages 159–168.ACM Press, June 2003.

[4] Erich Gamma and Kent Beck. Test in-fected: Programmers love writing tests.Java Report, 3(7):37–50, July 1998.

[5] Mohammad Ghoniem, Jean-DanielFekete, and Philippe Castagliola. Acomparison of the readability of graphsusing node-link and matrix-based repre-sentations. In Matt Ward and TamaraMunzner, editors, proceedings of the 10th

symposium on Information Visualisation,pages 17–24. IEEE Computer SocietyPress, October 2004.

[6] Yann-Gael Gueheneuc. A reverse engi-neering tool for precise class diagrams. InJanice Singer and Hanan Lutfiyya, editors,Proceedings of the 14th IBM Centers forAdvanced Studies Conference, pages 28–41. ACM Press, October 2004.

[7] Yann-Gael Gueheneuc and Herve Albin-Amiot. Recovering binary class relation-ships: Putting icing on the UML cake.In Doug C. Schmidt, editor, Proceedingsof the 19th conference on Object-OrientedProgramming, Systems, Languages, andApplications, pages 301–314. ACM Press,October 2004.

[8] Irit Hadar and Orit Hazzan. On the con-tribution of UML diagrams to softwaresystem comprehension. journal of Ob-ject Technology, 3(1):143–156, January–February 2004.

[9] Brian Henderson-Sellers and Franck Bar-bier. A survey of the UML’s aggregationand composition relationships. L’objet: Logiciel, Base de donnees, Reseaux,5(3/4):339–366, December 1999.

[10] Daniel Jackson and Allison Waingold.Lightweight extraction of object modelsfrom bytecode. In David Garlan andJeff Kramer, editors, proceedings of the21st International Conference on SoftwareEngineering, pages 194–202. ACM Press,May 1999.

[11] Kostas Kontogiannis. ICPC web-site,April 2006. http://www.icpc2006.uwaterloo.ca/.

[12] Guillaume Langelier, Houari A. Sahraoui,and Pierre Poulin. Visualization-basedanalysis of quality for large-scale softwaresystems. In Tom Ellman and AndreaZisma, editors, proceedings of the 20th in-ternatinal conference on Automated Soft-ware Engineering. ACM Press, November2005.

12

Page 13: Taupe : Towards Understanding Program Comprehensionjmaletic/Prog-Comp/Papers/gueheneuc06.doc.pdf · and to present collected data and its im-plementation, the Taupe data viewer. •

[13] Gail C. Murphy, Mik Kersten, Martin P.Robillard, and Davor Cubranis. Theemergent structure of development tasks.In Andrew P. Black, editor, proceed-ings of the 19th European Conference onObject-Oriented Programming, pages 33–48. Springer-Verlag, July 2005.

[14] James Noble and John Grundy. Explicitrelationships in object-oriented develop-ment. In Bertrand Meyer, editor, proceed-ings of the 18th conference on the Technol-ogy of Object-Oriented Languages and Sys-tems, pages 211–226. Prentice-Hall, No-vember 1995.

[15] Object Management Group. UML v1.5Specification, March 2003.

[16] Stephen E. Palmer. Vision Science: Pho-tons to Phenomenology. The MIT Press,1st edition, May 1999.

[17] Ivan P. Paltor and Johan Lilius. Digitalsound recorder: A case study on design-ing embedded systems using the UML no-tation. Technical Report TR-234, TurkuCentre for Computer Science, January1999.

[18] Dewayne E. Perry and Alexander L. Wolf.Foundations for the study of software ar-chitecture. Software Engineering Notes,17(4):40–52, October 1992.

[19] Vaclav Rajlich. Program comprehensionas a learning process. In Yingxu Wang,editor, proceedings of the 1st Interna-tional Conference on Cognitive Informat-ics, pages 343–347. IEEE Computer Soci-ety Press, August 2002.

[20] Irvin Rock and Daniel Gutman. The effectof inattention on form perception. Jour-nal of Experimental Psychology: HumanPerception and Performance, 7:275–285,1981.

[21] Anthony Santella and Doug DeCarlo. Ro-bust clustering of eye movement record-ings for quantification of visual interest.In Andrew Duchowski and Roel Vertegaal,editors, proceedings of the 3rd symposium

on Eye Tracking Research and Applica-tions, pages 27–34. ACM Press, March2004.

[22] Jochen Seemann. Extending the Sugiyamaalgorithm for drawing UML class di-agrams: Towards automatic layout ofobject-oriented software diagrams. InGiuseppe Di Battista, editor, proceedingsof the 5th international symposium onGraph Drawing, pages 415–424. Springer-Verlag, September 1997.

[23] Elliot Soloway. Learning to program =Learning to construct mechanisms and ex-planations. Communications of the ACM,29(9):850–858, September 1986.

[24] Dabo Sun and Kenny Wong. On evalu-ating the layout of UML class diagramsfor program comprehension. In James R.Cordy and Harald Gall, editors, proceed-ings of the 13th International Workshopon Program Comprehension, pages 317–326. IEEE Computer Society Press, May2005.

[25] Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction.MIT Press, 1st edition, March 1998.

[26] David S. Wooding. Fixation maps: Quan-tifying eye-movement traces. In RoelVertegaal and John W. Senders, editors,proceedings of the 2nd symposium on EyeTracking Research and Applications, pages31–36. ACM Press, March 2002.

13