Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and...

Post on 17-Dec-2015

221 views 0 download

Tags:

Transcript of Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and...

Spreadsheet Visualisation to Improve End-user Understanding

Daniel Ballinger, Robert Biddle and James Noble

School of Mathematical and Computing SciencesEmail: {db, robert, kjx}@mcs.vuw.ac.nz

http://www.mcs.vuw.ac.nz/~db/honours.html

Motivation

Spreadsheets are a common form of end-user programming.

Unfamiliar spreadsheets can contain daunting amounts of information in the layout and inter-cell dependencies.

Visualisation can be used to aid in end-user understanding of spreadsheets.

Working outside the spreadsheet application allows for greater flexibility in visualisation.

We focused on Microsoft Excel due to its large market share.

Excel’s Current Visualisation Support – Range Finder

Invoked by clicking in a cell and then in the formula bar.Components are coloured in the bar and sheet.Allows for visual manipulation.Mainly only useful for spatially close cells.

Range Finder

Excel’s Current Visualisation Support – Formula Auditing Tools

Invoked using Formula Auditing Toolbar.Trace dependents or precedents.Arrows always point to referenced cell.Users may navigate spatially disjoint cells. (semantic navigation)Complicated spreadsheets can create a tangle of arrows.

Auditing Tools

Related Work

Takeo Igarashi– Spreadsheets augment “a visible tabular layout with invisible formulas”.– Created visualisations to help reveal the hidden dataflow graphs and

superficial tabular layouts of spreadsheets. Markus Clermont

– Most end-users are not trained programmers.– Many spreadsheets exist beyond being simple scratch pads.

Raymond Panko– Studies of empirical data into spreadsheet errors.– Found error rates can be disturbingly high.– Errors attributed to over confidence and lack of formal checking.

Margaret Burnett– The importance of scalability in visualisations.– Forms/3 and an embedded testing methodology.

Spreadsheet Application Toolkit

Find and store spreadsheets from the Internet.Extract low level structures. E.g. Cell values and formulas.Analyse spreadsheet structures. Either individual or corpus.Conveying the findings through visualisation.

Query

URLs

XLS files

Algorithms

Image

Gobbler Google

Fetcher

Analyser

Extractor

Visualisation Tools

Web Servers

Toolkit Files

BIFF Reader

Processed Data

Metrics

Visualisations

Spreadsheet layout Clustering Data Dependency Flow Data Dependency Direction Graph Structure Fisheye view Formula Inspection Corpus Analysis

Spreadsheet layout – Real-estate Utilisation 2D

Understanding layout is an important first step in learning about a new spreadsheet.Actual values and formulas are only shown as occupied cells.The visualisation layout mimics that of Excel, with columns along the top of the x-axis and rows running down the y-axis.Cells with a higher occupancy level are coloured towards the red end of the colour spectrum.

Spreadsheet layout – Real-estate Utilisation 3D

Occupancy data is projected into 3D to create a surface map.Discrete to continuous data transformation helps smooth the effects of spikes.Coloured to give a Topographical terrain effect.Full benefit is seen with user interaction.

BIRCH Clustering

BIRCH clustering partitions records into clusters that are similar according to two or more attributes.Current visualisations use the Euclidean distance between cells as the similarity metric.

Data Dependency Flow

Visualising just average unit vectors for each cell can reduce the visual clutter.3D can be used to separate vectors that occur at different sheet levels.Note the curvature back towards the origin for this workbook.

Data Dependency Direction

Concentrate purely on the directions of data flow relative to cells.Angles are sorted into 36 buckets then feed to Excel to create the graph.After the four main axis the next significant measure occurs between 300 and 360º.

Radar Graph for Outgoing Dependencies

Graph Structure

Source DataSpring view

Disregarding spatial bounds allows some structures to become clearer.

Fisheye view - Focus + Context

Addresses formula dependencies that span large distances or are many cells deep.Trees are warped over a hyperbolic lens to achieve both focus in the centre and context.An artificial red root node is introduced to connect disjoint trees.

Formula Inspection – Data Flow

Visualising formula components and flow direction.Fully trace worksheets in one view.

Basic Referencing Components

Formula Inspection - Dependency Types

Excel allows for combinations of relative and absolute positioning.Understanding the referencing type is important when replicating formula and identifying regular patterns.

Relative

Row Absolute

Fully Absolute

Column Absolute

Corpus Analysis of 259 Workbooks

Demonstrations of visualisations created from a corpus.With this sample corpus the average worksheet centre is more column centric.Function utilisation suggests that the logical functions, such as IF, actually outnumber simpler math functions like SUM.

Spatial Centre

Function Utilisation

Number of non-empty Worksheets: 227Number of empty Worksheets: 195Average Row: 1.348Average Column 18.098Max Row: 1384Max Col: 82Total Occupied cells: 55491Orphans : 51570Root Cells : 2105Leafs: 1031Nodes in Cyclic References: 29Local Formula: 108Family Trees: 509Max Tree Depth: 22Max Tree Breadth: 150

Summary

Spreadsheets are significant examples of end-user programming

Visualisation can assist end-users in better understanding the structure of spreadsheets– In particular, the “hidden structures” created by

formula

Reviewed literature to investigate the implications of the hidden structures.

Developed a toolkit to externally access the spreadsheet structure and generate visualisations.

Created several sample visualisations to help improve end-user understanding.

Current and Future Work

Detailed user studies, including usability evaluations

Domain specific visualisations Spreadsheet corpus analysis to find large

patterns Visualisation scalability to larger more

complex spreadsheets

http://www.mcs.vuw.ac.nz/~db/honours.html