Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and...

19
Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences Email: {db, robert, kjx}@mcs.vuw.ac.nz http://www.mcs.vuw.ac.nz/~db/honours.html

Transcript of Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and...

Page 1: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Spreadsheet Visualisation to Improve End-user Understanding

Daniel Ballinger, Robert Biddle and James Noble

School of Mathematical and Computing SciencesEmail: {db, robert, kjx}@mcs.vuw.ac.nz

http://www.mcs.vuw.ac.nz/~db/honours.html

Page 2: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Motivation

Spreadsheets are a common form of end-user programming.

Unfamiliar spreadsheets can contain daunting amounts of information in the layout and inter-cell dependencies.

Visualisation can be used to aid in end-user understanding of spreadsheets.

Working outside the spreadsheet application allows for greater flexibility in visualisation.

We focused on Microsoft Excel due to its large market share.

Page 3: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Excel’s Current Visualisation Support – Range Finder

Invoked by clicking in a cell and then in the formula bar.Components are coloured in the bar and sheet.Allows for visual manipulation.Mainly only useful for spatially close cells.

Range Finder

Page 4: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Excel’s Current Visualisation Support – Formula Auditing Tools

Invoked using Formula Auditing Toolbar.Trace dependents or precedents.Arrows always point to referenced cell.Users may navigate spatially disjoint cells. (semantic navigation)Complicated spreadsheets can create a tangle of arrows.

Auditing Tools

Page 5: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Related Work

Takeo Igarashi– Spreadsheets augment “a visible tabular layout with invisible formulas”.– Created visualisations to help reveal the hidden dataflow graphs and

superficial tabular layouts of spreadsheets. Markus Clermont

– Most end-users are not trained programmers.– Many spreadsheets exist beyond being simple scratch pads.

Raymond Panko– Studies of empirical data into spreadsheet errors.– Found error rates can be disturbingly high.– Errors attributed to over confidence and lack of formal checking.

Margaret Burnett– The importance of scalability in visualisations.– Forms/3 and an embedded testing methodology.

Page 6: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Spreadsheet Application Toolkit

Find and store spreadsheets from the Internet.Extract low level structures. E.g. Cell values and formulas.Analyse spreadsheet structures. Either individual or corpus.Conveying the findings through visualisation.

Query

URLs

XLS files

Algorithms

Image

Gobbler Google

Fetcher

Analyser

Extractor

Visualisation Tools

Web Servers

Toolkit Files

BIFF Reader

Processed Data

Metrics

Page 7: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Visualisations

Spreadsheet layout Clustering Data Dependency Flow Data Dependency Direction Graph Structure Fisheye view Formula Inspection Corpus Analysis

Page 8: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Spreadsheet layout – Real-estate Utilisation 2D

Understanding layout is an important first step in learning about a new spreadsheet.Actual values and formulas are only shown as occupied cells.The visualisation layout mimics that of Excel, with columns along the top of the x-axis and rows running down the y-axis.Cells with a higher occupancy level are coloured towards the red end of the colour spectrum.

Page 9: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Spreadsheet layout – Real-estate Utilisation 3D

Occupancy data is projected into 3D to create a surface map.Discrete to continuous data transformation helps smooth the effects of spikes.Coloured to give a Topographical terrain effect.Full benefit is seen with user interaction.

Page 10: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

BIRCH Clustering

BIRCH clustering partitions records into clusters that are similar according to two or more attributes.Current visualisations use the Euclidean distance between cells as the similarity metric.

Page 11: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Data Dependency Flow

Visualising just average unit vectors for each cell can reduce the visual clutter.3D can be used to separate vectors that occur at different sheet levels.Note the curvature back towards the origin for this workbook.

Page 12: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Data Dependency Direction

Concentrate purely on the directions of data flow relative to cells.Angles are sorted into 36 buckets then feed to Excel to create the graph.After the four main axis the next significant measure occurs between 300 and 360º.

Radar Graph for Outgoing Dependencies

Page 13: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Graph Structure

Source DataSpring view

Disregarding spatial bounds allows some structures to become clearer.

Page 14: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Fisheye view - Focus + Context

Addresses formula dependencies that span large distances or are many cells deep.Trees are warped over a hyperbolic lens to achieve both focus in the centre and context.An artificial red root node is introduced to connect disjoint trees.

Page 15: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Formula Inspection – Data Flow

Visualising formula components and flow direction.Fully trace worksheets in one view.

Basic Referencing Components

Page 16: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Formula Inspection - Dependency Types

Excel allows for combinations of relative and absolute positioning.Understanding the referencing type is important when replicating formula and identifying regular patterns.

Relative

Row Absolute

Fully Absolute

Column Absolute

Page 17: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Corpus Analysis of 259 Workbooks

Demonstrations of visualisations created from a corpus.With this sample corpus the average worksheet centre is more column centric.Function utilisation suggests that the logical functions, such as IF, actually outnumber simpler math functions like SUM.

Spatial Centre

Function Utilisation

Number of non-empty Worksheets: 227Number of empty Worksheets: 195Average Row: 1.348Average Column 18.098Max Row: 1384Max Col: 82Total Occupied cells: 55491Orphans : 51570Root Cells : 2105Leafs: 1031Nodes in Cyclic References: 29Local Formula: 108Family Trees: 509Max Tree Depth: 22Max Tree Breadth: 150

Page 18: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Summary

Spreadsheets are significant examples of end-user programming

Visualisation can assist end-users in better understanding the structure of spreadsheets– In particular, the “hidden structures” created by

formula

Reviewed literature to investigate the implications of the hidden structures.

Developed a toolkit to externally access the spreadsheet structure and generate visualisations.

Created several sample visualisations to help improve end-user understanding.

Page 19: Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences.

Current and Future Work

Detailed user studies, including usability evaluations

Domain specific visualisations Spreadsheet corpus analysis to find large

patterns Visualisation scalability to larger more

complex spreadsheets

http://www.mcs.vuw.ac.nz/~db/honours.html