Defined the goal of information visualization and discussed the visualization tasks for BI. ...

33
Defined the goal of information visualization and discussed the visualization tasks for BI. Identified methods of enhancing understanding and amplifying cognition: Reduce search time and enhance recognition of patterns (using pre-attentive processing); provide focus/emphasis through afforandances. Reviewed heuristics from Tufte and Nielsen. Saw an example of a multivariate visualization for the task of communication. Previously in class about information visualization... 1

Transcript of Defined the goal of information visualization and discussed the visualization tasks for BI. ...

Page 1: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

1

Defined the goal of information visualization and discussed the visualization tasks for BI.

Identified methods of enhancing understanding and amplifying cognition:◦ Reduce search time and enhance recognition of patterns

(using pre-attentive processing); ◦ provide focus/emphasis through afforandances.

Reviewed heuristics from Tufte and Nielsen.

Saw an example of a multivariate visualization for the task of communication.

Previously in class about information visualization...

Page 2: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Understand quantitative relationships (optional review)◦ Nominal vs. ordinal vs. interval vs. hierarchical

relationships◦ Ranking vs. ratio vs. correlation◦ Measures of average and distribution

Concepts of tables and graphs◦ Tables are used to see individual values; graphs are

used to reveal relationships among multiple values◦ Tables and graphs should be sorted to highlight key

message.◦ Relative use of pie charts, bar charts, line charts,

sparkline, small multiples, box plot...◦ Showing relationships vs. deviation vs. correlation vs.

ranking vs. time-series vs. part-to-whole vs. distribution ◦ Importance of sorting tables and graphs.

Previously in the readings...

Page 3: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Finish evaluating a few sample individual visualizations.

Explain how visualizations fit within the overall BI architecture.

Discuss the differences between OLAP and data mining.

Present dashboards as the most common OLAP visualization tool.

Begin discussion of data mining.

What’s up for today?

Page 4: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

The purpose is to compare one product’s sales to other products. Good or bad visualization?

Page 5: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

5

Page 6: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

6

The purpose is to display sales revenue in the state of Kansas associated with 12 products across the four quarters of a year. How would you improve this visualization?

Page 7: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

BI – what is it again???

“Business Intelligence” is making purposeful use of data in decision making.

The goals of BI are:◦ To support human decision making by providing

as much understandable, complete, relevant, well-organized information as necessary and helpful.

◦ To automate some decisions to relieve humans of routine decision making tasks.

◦ To discover new issues/relationships/correlations that may not be able to be readily conceived by humans.

Page 8: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

8

BI Architecture

DataSources

ERP

Legacy

POS

OtherOLTP/wEB

External data

Select

Transform

Extract

Integrate

Load

ETL Process

EnterpriseData warehouse

Metadata

Replication

A P

I

/ M

iddl

ewar

e Data/text mining

Custom builtapplications

OLAP,Dashboard,Web

RoutineBusinessReporting

Applications(Visualization)

Data mart(Engineering)

Data mart(Marketing)

Data mart(Finance)

Data mart(...)

Access

No data marts option

Page 9: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Overall Components of BI Architecture Data Sources available for input. ETL tools to bring input data into an integrated data source. Integrated Data Source (usually a data warehouse).

◦ Structured and unstructured data.◦ Internal and external data.

Metadata repository.◦ Data definitions and meanings.◦ Business rules and process decisions.

Analytical tools.◦ OLAP: Online Analytical Processing◦ Statistical analysis.◦ Data Mining.

Data Visualization.◦ Graphical, tables, pictures.

Page 10: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Online analytical processing tools The vast majority of output from BI is OLAP-related. Provide information to support both ad-hoc and

consistent queries for managerial decision making. Provide multi-dimensional data analysis techniques. Work primarily with data aggregation. Data mart/derived data model. Provide advanced statistical analysis. Support access to very large databases through

additional data structures such as SQL Server Analysis Services (cubes).

Contain enhanced query optimization algorithms to facilitate query processing speed (SQL Server Analysis Services).

Page 11: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

OLAP Results

Generates relatively standardized reports to ad-hoc queries.

Answers questions such as:

Which products sold the most quantity - by type of product and geographic region?

Which stores are currently most profitable? Which are least profitable?

Used frequently to support short and long term managerial decision making.

OLAP Visualization

Presented in standard displays that are accessed frequently

Dashboard format used to provide quick and comprehensive overview of business status.

Presented in Excel or other spreadsheet format.

Display the output using either a standard report generator (Crystal Reports, Access, etc.)

Display the output graphically.

Page 12: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Data mining is the set of activities used to find new, hidden or unexpected patterns in data.

Data mining tools: ◦ use large sets of data; ◦ uncover patterns based on statistical and artificial

intelligence algorithms; ◦ form computer models based on the findings; and◦ use the models to predict business behavior.

Common synonyms for data mining include knowledge discovery, information harvesting, & pattern analysis.

Proactive tools, used for discovery and prediction.

Data mining tools

Page 13: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Data Mining Results

Generates information about patterns in data.

Data mining provides answers to previously ambiguous questions; but a question area must be defined.

May produce information such as:

Which products should be promoted to a pre-defined type/category of customer?

Which patients have the greatest likelihood of being hospitalized within the next year?

Which securities are the most profitable to buy/sell in a particular environment?

Data Mining Visualization

Focus is on discovery and analysis, rather than reporting, monitoring or communicating a message.

Uses primarily graphical output to display the patterns.

Included as part of the data mining tool.

Can also incorporate the results in standardized reporting tools and/or dashboards, but information is already “discovered” by that time.

Page 14: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

How many people between the ages of 15-30 are diagnosed with type 2 diabetes?

What is the quantity breakdown by county in the U.S. for people diagnosed with type 2 diabetes?

What is the relationship between weight, exercise, age smoking, and the prevalence of type 2 diabetes?

What demographic factors are related to type 2 diabetes?

Is it OLAP or Data Mining??

Page 15: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

15

How many different customers did we serve? How many applicants did we place?

Which customer was our most profitable? Which customers have the greatest likelihood of

increasing their number of temporary employees next year?

Which geographic region was our most profitable last quarter?

Which geographic region has the fastest growth rate measured by number of employees placed over the last 3 years?

Is it OLAP or data mining?? (TEC)

Page 16: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

16

Most common visualization method for OLAP.

Visual display – not printed. Must have metrics. What is a metric,

again?? Key Information

◦ Most important information to monitor one or more objectives

◦ Usually related directly to key performance indicators

◦ Consolidated Fits on one screen (no scrolling!) Designed to be monitored at a glance

Dashboards

Page 19: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

19

Derived from the work on executive information systems (late 1980’s through 1990’s).

Further roots in the work on the “balanced scorecard” concept to broaden perspective from financials alone.

Uses the dashboard metaphor to develop fast recognition and appeal.

Dashboards are not new...

Page 20: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Always need to know the goal

Strategic Analytical Operational

Audience Executives, managers

Managers, analysts Executives, managers, BOG

Use High-level performance;Relationships

Detailed understanding of KPI factors

Run daily, weekly, monthly operations

Design Simple displays;Provide context; Include forecasts

Rich comparisons; more context, multivariate

Maintain awareness through dynamic, simple displays

Issues andCautions

Beware too much information;Avoid subtle gradations;Link to KPI;Don’t bother with real-time data

Provide drill-down;Enable exploration; Show movement; Allow examination of causes;Probably doesn’t require real-time data

Specific information available; provide drill-down;Exceptions are critical; requires real-time data; use hovering

Page 21: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Typical dashboard data

Category

Measures

Sales BillingsBookings# of OrdersOrder Amounts

Category Measures

Marketing Market shareAd campaign $Cust. Demographics

Category

Measures

IT Network downtimeSystem usageFixed app defects

Category Measures

Tech Support

# of support callsResolved casesCustomer satisfactionCall duration

Category

Measures

Finance RevenuesExpensesProfits

Category

Measures

Human Resources

Employee satisfactionEmployee turnoverCount of open positionsCount of late reviews

Page 22: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Overall design◦ Exceeding boundaries of a single screen.◦ Limiting design to the dashboard metaphor.◦ Choosing ineffective or inappropriate visualization

methods.◦ Poor flow/arrangement of presentation of data.

Content◦ Choosing a deficient, inappropriate or ineffective measure.◦ Supplying inadequate context for the data.◦ Displaying excessive detail or precision.

Detailed design (look and feel)◦ Misusing or overusing color; meaningless variety of color

and shape.◦ Poor highlighting of important data.◦ Cluttering the display with useless decoration.

Common mistakes

Page 23: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

23

Delivers information that is:◦ Exceptionally well-organized.◦ Condensed.◦ Provides summaries and exceptions.◦ Specific to the requirements of the audience.◦ Presented on the media of choice for the

audience (computer, phone, tablet, etc.)◦ Flexible.◦ Able to be pursued in more detail beyond the

dashboard.

Well-designed dashboard

Page 24: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

24

Understand and make best use of screen real estate

Maximize the data-ink /total-ink ratio (or data pixels/total pixels ratio...)

Eliminate all unnecessary non-data pixels De-emphasize all non-data pixels and make

them slip into the background of the overall design

Highlight the most important data pixels

Key Goals (Tufte, 1980’s, Few, 2010’s)

Page 25: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Emphasized Neither emphasized or de-

emphasized

Neither emphasized or de-emphasized De-emphasized

Emphasized

Page 26: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Maximize data pixels/total pixels ratio

Salesperson Jan Feb Mar

Bill Bassett 2,834 4,340 4,885

Jenny Martin 5,890 7,439 6,493

Luis Marquez 3,899 6,889 8,593

Bob Taylor 1,250 3,445 5,443

Salesperson Jan Feb Mar

Bill Bassett 2,834 4,340 4,885

Jenny Martin 5,890 7,439 6,493

Luis Marquez 3,899 6,889 8,593

Bob Taylor 1,250 3,445 5,443

Page 27: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

27

Janu

ary

Febr

uary

March

April

MayJu

ne July

Augus

t

Sept

embe

r

Octob

er

Novem

ber

Decem

ber

0

2000

4000

6000

8000

10000

12000

Store 1 Store 44 Store 8 Store 6

Janu

ary

March

MayJu

ly

Sept

embe

r

Novem

ber

0

2000

4000

6000

8000

10000

12000

Store 1

Store 8

Store 1Store 44Store 8Store 6

Page 28: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

28

Grid lines in graphs that don’t need precision

Backgrounds that don’t provide delineation of sections on the dashboard

3-D that doesn’t provide additional variables or layers of analysis

Drawings that are not part of the data – including detailed logos

Colors that don’t highlight or emphasize data

Meters and gauges that don’t incorporate preattention

Junk pixels

Page 29: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Arrange the overall design to reflect how the intended audience “thinks” about the decisions to be made.

Group related data. Arrange the data in a meaningful order (low

to high; high to low) Use bright colors sparingly and judiciously. Avoid use of a colored background. White space is an effective delimiter. Use fonts with good legibility and readability.

Good design

Page 30: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

30

Also graphical, but designed for an analyst to discover patterns, not to communicate information for managerial decision making.

Must understand a bit more about data mining while discussing visualization.

So, what about data mining visualization?

Page 31: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Opening Vignette:Data Mining Goes to Hollywood!

Independent Variable Number of Values

Possible Values

MPAA Rating 5 G, PG, PG-13, R, NR

Competition 3 High, Medium, Low

Star value 3 High, Medium, Low

Genre 10

Sci-Fi, Historic Epic Drama, Modern Drama, Politically Related, Thriller, Horror, Comedy, Cartoon, Action, Documentary

Special effects 3 High, Medium, Low

Sequel 1 Yes, No

Number of screens 1 Positive integer

Class No. 1 2 3 4 5 6 7 8 9

Range

(in $Millions)

< 1

(Flop)

> 1

< 10

> 10

< 20

> 20

< 40

> 40

< 65

> 65

< 100

> 100

< 150

> 150

< 200

> 200

(Blockbuster)

Dependent

Variable Independent

Variables

A Typical Classification

Problem

Page 32: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

ModelDevelopmentprocess

ModelAssessmentprocess

The DM Process Map in IBM SPSS Modeler

Page 33: Defined the goal of information visualization and discussed the visualization tasks for BI.  Identified methods of enhancing understanding and amplifying.

Prediction Models

Individual Models Ensemble Models

Performance Measure SVM ANN C&RT

Random Forest

Boosted Tree

Fusion (Average)

Count (Bingo) 192 182 140 189 187 194

Count (1-Away) 104 120 126 121 104 120

Accuracy (% Bingo) 55.49% 52.60% 40.46% 54.62% 54.05% 56.07%

Accuracy (% 1-Away) 85.55% 87.28% 76.88% 89.60% 84.10% 90.75%

Standard deviation 0.93 0.87 1.05 0.76 0.84 0.63

* Training set: 1998 – 2005 movies; Test set: 2006 movies