Tree Visualization with Tree-Maps: 2-d Space-Filling...

I THE INTERACTION TECHNIQUE NOTEBOOK I

Tree Visualization with Tree-Maps:2-d Space-Filling Approach

Ben Shneiderman

University of Maryland

Introduction. The traditional approach to representing tree structures is asa rooted, directed graph with the root node at the top of the page and childrennodes below the parent node with lines connecting them (Figure 1). Knuth [21has a long discussion about this standard representation, especially why theroot is at the top, and he offers several alternatives including brief mention ofa space-filling approach. However, the remainder of his presentation andmost other discussions of trees focus on various node and edge representa-

tions. By contrast, this paper deals with a two-dimensional (2-d) space-fillingapproach in which each node is a rectangle whose area is proportional tosome attribute such as node size.

Research on relationships between 2-d images and their representation intree structures has focussed on node and link representations of 2-d images.This work includes quad-trees [51 and their variants which are important inimage processing. The goal of quad trees is to provide a tree representationfor storage compression and efilcient operations on bit-mapped images. XY-trees [41 are a traditional tree representation of two-dimensional layoutsfound in newspaper, magazine, or book pages. Related concepts include k-dtrees [11, which are often explained with the help of a 2-d rectangulardrawing, and hB-trees [31 which are a more advanced multiattribute indexingmethod that has a useful 2-d representation. None of these projects sought toprovide human visualization aids for viewing large tree structures.

Tree-maps are a representation designed for human visualization of com-plex traditional tree structures: arbitrary trees are shown with a 2-d space-filling representation. The original motivation for this work was to gain abetter representation of the utilization of storage space on a hard disk as

Author’s address: Department of Computer %ience, Human-Computer Interaction Laboratory,University of Maryland, College Park, MD 20742.Permission to copy without fee all or part of this material is granted provided that the copies arenot made or distributed for direct commercial advantage, the ACM copyright notice and the titleof the publl%ation and its date appear, and notice is given that copying is by permission of theAssociation for Computing Machinery. To copy otherwiee, or to republish, requires a fee and/orspecific permission.@ 1992 ACM 0730-0301/92/0100-0092 $01.50

ACM Transactions on Graphics, Vol. 11, No. 1, January 1992, Pages 92-99.

Tree Visualization with Tree-Maps: A 2-D Space-Filling Approach . 93

Fig. 1. Typical 3-level tree structure with numbers indicating size of each leaf node

viewed from the perspective of a multiple level directory of subdirectories andfiles, as in Unix, Macintosh Finder, or MS-DOS. In this application the filesare leaf nodes and the subdirectories are interior nodes, Most operating

systems display the contents of one node at a time with names of the files and

subdirectories or icons to represent them. The user can traverse the tree with

a mouse click on a directory folder icon or by issuing a command (e. g., CD forchange directory). A few systems attempt to show more than one node at atime, but very quickly the limited screen space is exceeded. Even cleverattempts to show the full tree structure and allow rapid page turning soonexceed practical limits. For example, Norton Utilities from Symantec andWindows 3.0 from Microsoft show trees on their side with the root at the leftof the display and indentation plus some lines to show the tree structure. Insummary, even elegant tree-like layouts of the hierarchical structures, suchas in Sun Microsystems Open Look, soon overwhelm the available displayspace and users cannot grasp the entire picture.

This is an old problem. Even designers of family trees, animal speciestrees, or organization charts found that a large wall was necessary to give thewhole picture. But even then, only the structural relationship is shown;additional information, such as the size or importance of each node, wasignored or written in text form.

In the computer directory representation application, the goal was to show

the entire set of files in a space-filling visualization that would allow users torecognize rapidly the larger files and consider them as candidates for deletionwhen the hard disk was filled. These large files might be at any level of thetree structure and the range in file sizes might be five of six orders ofmagnitude (from a few bytes to a few million bytes). There might be two toeight levels with many thousands of files.

ACM Transactions on Graphics, Vol. 11, No. 1, January 1992

94

PI

. B. Shneiderman

~l,yl) P2(x3,Y1)

. ., , ,.,.,:,~. }“’’”1,, ///..... ,. ~. . . . . . . . . . ..:. :.:, :,:O:O:O:.:.: 1

. . . . .

. . . . .

. . . . .

. . . . .

fQ2/xl,y2) Ql(x2,y2)

Onelength

Fig.2. Tree-map of Figurel.

approach to this problem is to choose a l-d representation with theof each file coded as the length of a portion of a multicolored line [2].

This is impractical because the line ‘would be too long to view. A 3-d or higherdimensional approach might be hard to draw and view. A 2-d space-fillingapproach has the potential to be drawable and comprehensible. If each filewere represented as a small rectangle, the method would work, but it would

be necessary to avoid a computationally intensive bin-packing algorithm.The following tree visualization approach, called tree-maps (Figures 1 and 2),appears to solve the practical problem and provide interesting opportunitiesfor other applications.

Tree-Map Algorithm. The algorithm takes a tree root (Figure 1) and arectangular area defined by the upper left and lower right coordinates Pl(xl,yl), Q1(x2, y2). The number of outgoing edges from the root node determinesthe number of partitions of the region [xl, x2]. Since the leftmost subtreecontains a fraction (Size(child[l]) /Size(root)) of the total number of bytes inthe root, then the first vertical partitioning line is drawn at

x3 = xl + (Size (child [l] ) /Size (root)) *(x2 – xl).

The algorithm then recurs down the left tree using the 90-degree rotatedrectangle P2(x3, yl), Q2(x1, y2) and splits on the y-axis direction, while theloop continues along the remaining subtrees making partitions on the re-maining rectangle P2(x3, yl), Q1(x2, y2). Therefore nodes are partitionedvertically at even levels and horizontally at odd levels (Figure 2).

ACMTransactions on Graphics, Vol. 11, No, 1, January 1992.


For visual clarity, different colors (or gray shading) must be used withineach region. The effect of seeing thousands of small rectangles is like acheckerboard with varying sized spots. Color coding could represent differenttypes of files (e.g., text, programs, binary, graphics, spreadsheets), owners ofprograms (each owner has a different color), frequency of use (brighter colorsfor more frequent use), or the age of a file (older files might be more yellow orgrayer). If adjacent areas have the same color, then a boundary line will benecessary. Since users’ needs will vary extensively, no specific solution cansatisfy all situations, and the users should have a control panel for severalparameters and also to indicate which colors are assigned to attribute values.

Since the files that take storage space on the hard disk are all leaf nodes(we ignore, for the moment, the usually modest storage overhead consumedby a directory), each interior node must have the total size of its subtree (ifnecessary, then propagate the sums of storage consumed for each file andsubdirectory up through the levels of the tree to the root). If this data is notmaintained by the system, then there must be a preliminary pass throughthe tree to collect this data and place it at each interior node. If the userdesires to see the display as a percentage of total disk space utilization, thenthe root node must have one additional child which is a dummy record whosesize is the entire unused portion of the disk.

The tree-map algorithm assumes a tree structure in which each nodecontains a record with its directory or file name ( name), the number ofchildren ( num. children), and an array of pointers to the next level(child[l . num.childrenl). The arguments to the tree-map algorithm are:

root: a pointer to the root of the tree or subtree

P, Q: arrays of length 2 with (x, y) coordinate pairs of opposite corners ofthe current rectangle (assume that Q contains the higher coordi-nates and P the lower coordinates, but this does not affect thecorrectness of the algorithm, only the order in which rectangles aredrawn)

axis: varies between O and 1 to indicate cuts to be made vertically andhorizontally

color: indicates the color to be used for the current rectangle.

In addition we need the following:

Paint_ rectangle: A procedure that paints within the rectangle using a givencolor, and resets the color variable.

Size: a function that returns the number of bytes in the node pointed to bythe argument. Alternatively, the size could be pre-computed and stored ineach node.

The initial call is

Treemap(root, P, Q, O, color)

where P and Q are the upper right and lower left corners of the display. Bysetting the axis argument to zero the initial partitions are made vertically. It


96 . B. Shneiderman

is assumed that arguments P and Q are passed by value (since, P, Q aremodified within):

Treemap(root, PIO II , QIO. . 1] , axis, color)Paint_ rectangle (P, Q, color) —paint full areawidth := Q[axisl -p[axis] —compute location of next slicefor i := to num_children do

Q[axis] := P[axis]+(Size(child[i])/Size(root) )*widthTreemap(child [i], recur on each slice,

p, Q, l-axis, color) flipping axesP [axis] := Q[axis]

end for

Execution speed andpattern. This algorithm runs linearly with the num-berofnodes inthe tree structure. This version will paint the rectangles fromleft to right and top to bottom, with deeper levels covering colored sections aspreviously drawn during the depth first traversal. Breadth first traversalsare also possible ,as are algorithms that do not cover already colored sections.We have also tried sequencing leaf nodes before subtrees with orderingbydate or size (ascending or descending each have their advantages).

Obtaining filenames. When used for directory displaying, the users needto examine the filename, extension, date, etc. This can be accomplishedinmany ways. For example, the users could move a cursor onto a candidateregion, and then click to obtain the relevant information at the bottom line ofthe screen or just above the cursor itself. Allowing other operations (deletion,copying, marking) by way of pop-up menus is a natural next step. If directorynames are desired then nested rectangles that show a containing frame couldbe used, although this would reduce the effective display space.

Display resolution. On a standard VGA display the resolution is 640 x 480pixels giving 307,200 total pixels, which is quite adequate for displaying oneto two thousand files (each file gets an average of 200 pixels). Of course,small files or zero byte files become too small to represent and are currentlyeliminated. With larger displays, still larger trees could be displayed. Withsmaller displays or larger trees, it might be necessary to select among thesubdirectories to get adequate detail or to add zooming to reveal small files.

Applications. The applications seem quite broad, but here are a few. In anorganization chart, the number of employees or budget in each division ordepartment might be represented to gain an idea of the relative size of eachand color might indicate closeness to planned levels. For a library that usesthe Dewey decimal system, the number of books within each topic could berepresented to see the relative strengths of the library’s holdings. In a stockportfolio, the dollar values of each purchase might be represented by the sizeand the profit /loss might be color coded. In a traditional tree structured datastructure, the probability or cost of access might be coded as the size to helpfind poorly balanced structures. If the tree structure represents a computerprogram, then the size could represent that amount of time spent in thatsegment of code, thus guiding an attempt to optimize performance.

ACM Transactions on Graphics, Vol. 11, No, 1, January 1992.


ACM Transactions on Graphics, Vol. 11, No. 1, January 1 9 9 2

98 - 6. Shneiderman

ACM Transactions on Graphics, Vol. 11, No. 1, January 1992.


Summary. This paper presents a novel approach to representing trees thathave weights or sizes on the leaf nodes. The 2-d visualization is space fillingand the recursive algorithm for generation runs rapidly. It depends on colorcoding (or shading) of regions and easily provides users with a quick overviewthat clearly indicates relative sizes of the leaf nodes. Figures 3 and 4 showexamples of tree-maps with size coding, as implemented by Brian Johnson onan Apple Macintosh II computer with a high resolution color display. Figure3 shows fifteen files in four directories at three levels, with nested boxes toshow the levels. Figure 4 represents actual disk directories encompassing 850files at four levels with color coding by file type (text, graphics, applications,etc. ). We continue to explore refinements of tree-maps such as alternatelayouts, better methods for coping with large ranges of file size, color codingschemes and operations applied to files.

ACKNOWLEDGMENTS

I gratefully acknowledge the thoughtful comments of the reviewers, editorDan Olsen’s constructive suggestions, and David Mount’s help in revising thetree-map algorithm. We appreciate the continuing financial support for theHuman-Computer Interaction Laboratory’s research from Apple, NCR Corpo-ration, and Sun Microsystems.

REFERENCES

1. B~NTt.~Y, J. L., AND FRIEDMAN, J. H. Data structures for range searching. ACM Comput.Suru. 11, 4 (Dec. 1979), 397-409.

2. KNUTH, D. l%. The Art of Computer Programming: Volume 1 ~Fundamental Algorithms.Addision-Wesley Publishing Co,, Reading, MA, 1968.

3. LOMET,D. B., AND SALZBERG,B. The hB-tree: A multiattribute indexing method with goodguaranteed performance. ACM Trans. Database Syst. 15, 4 (Dec. 1990), 625-658.

4, NAGY, G., AND SETH, S. Hierarchical representation of optically scanned documents. InProceedings of the IEEE 7th [nternatwnal C’onference on Pattern Recognition (Montreal,Canada, 19S4), pp. 347-349.

5. SAMRT, H. Design and Analysis of Spatial Data Structures. Addison-Wesley Publishing Co.,Reading, MA, 1989.

Received October 1990; revised March and July 1991; accepted July 1991


Tree Visualization with Tree-Maps: 2-d Space-Filling...

Documents

Transcript of Tree Visualization with Tree-Maps: 2-d Space-Filling...