Defined the goal of information visualization and discussed the visualization tasks for BI. ...
-
Upload
jeffery-davis -
Category
Documents
-
view
233 -
download
6
Transcript of Defined the goal of information visualization and discussed the visualization tasks for BI. ...
1
Defined the goal of information visualization and discussed the visualization tasks for BI.
Identified methods of enhancing understanding and amplifying cognition:◦ Reduce search time and enhance recognition of patterns
(using pre-attentive processing); ◦ provide focus/emphasis through afforandances.
Reviewed heuristics from Tufte and Nielsen.
Saw an example of a multivariate visualization for the task of communication.
Previously in class about information visualization...
Understand quantitative relationships (optional review)◦ Nominal vs. ordinal vs. interval vs. hierarchical
relationships◦ Ranking vs. ratio vs. correlation◦ Measures of average and distribution
Concepts of tables and graphs◦ Tables are used to see individual values; graphs are
used to reveal relationships among multiple values◦ Tables and graphs should be sorted to highlight key
message.◦ Relative use of pie charts, bar charts, line charts,
sparkline, small multiples, box plot...◦ Showing relationships vs. deviation vs. correlation vs.
ranking vs. time-series vs. part-to-whole vs. distribution ◦ Importance of sorting tables and graphs.
Previously in the readings...
Finish evaluating a few sample individual visualizations.
Explain how visualizations fit within the overall BI architecture.
Discuss the differences between OLAP and data mining.
Present dashboards as the most common OLAP visualization tool.
Begin discussion of data mining.
What’s up for today?
The purpose is to compare one product’s sales to other products. Good or bad visualization?
5
6
The purpose is to display sales revenue in the state of Kansas associated with 12 products across the four quarters of a year. How would you improve this visualization?
BI – what is it again???
“Business Intelligence” is making purposeful use of data in decision making.
The goals of BI are:◦ To support human decision making by providing
as much understandable, complete, relevant, well-organized information as necessary and helpful.
◦ To automate some decisions to relieve humans of routine decision making tasks.
◦ To discover new issues/relationships/correlations that may not be able to be readily conceived by humans.
8
BI Architecture
DataSources
ERP
Legacy
POS
OtherOLTP/wEB
External data
Select
Transform
Extract
Integrate
Load
ETL Process
EnterpriseData warehouse
Metadata
Replication
A P
I
/ M
iddl
ewar
e Data/text mining
Custom builtapplications
OLAP,Dashboard,Web
RoutineBusinessReporting
Applications(Visualization)
Data mart(Engineering)
Data mart(Marketing)
Data mart(Finance)
Data mart(...)
Access
No data marts option
Overall Components of BI Architecture Data Sources available for input. ETL tools to bring input data into an integrated data source. Integrated Data Source (usually a data warehouse).
◦ Structured and unstructured data.◦ Internal and external data.
Metadata repository.◦ Data definitions and meanings.◦ Business rules and process decisions.
Analytical tools.◦ OLAP: Online Analytical Processing◦ Statistical analysis.◦ Data Mining.
Data Visualization.◦ Graphical, tables, pictures.
Online analytical processing tools The vast majority of output from BI is OLAP-related. Provide information to support both ad-hoc and
consistent queries for managerial decision making. Provide multi-dimensional data analysis techniques. Work primarily with data aggregation. Data mart/derived data model. Provide advanced statistical analysis. Support access to very large databases through
additional data structures such as SQL Server Analysis Services (cubes).
Contain enhanced query optimization algorithms to facilitate query processing speed (SQL Server Analysis Services).
OLAP Results
Generates relatively standardized reports to ad-hoc queries.
Answers questions such as:
Which products sold the most quantity - by type of product and geographic region?
Which stores are currently most profitable? Which are least profitable?
Used frequently to support short and long term managerial decision making.
OLAP Visualization
Presented in standard displays that are accessed frequently
Dashboard format used to provide quick and comprehensive overview of business status.
Presented in Excel or other spreadsheet format.
Display the output using either a standard report generator (Crystal Reports, Access, etc.)
Display the output graphically.
Data mining is the set of activities used to find new, hidden or unexpected patterns in data.
Data mining tools: ◦ use large sets of data; ◦ uncover patterns based on statistical and artificial
intelligence algorithms; ◦ form computer models based on the findings; and◦ use the models to predict business behavior.
Common synonyms for data mining include knowledge discovery, information harvesting, & pattern analysis.
Proactive tools, used for discovery and prediction.
Data mining tools
Data Mining Results
Generates information about patterns in data.
Data mining provides answers to previously ambiguous questions; but a question area must be defined.
May produce information such as:
Which products should be promoted to a pre-defined type/category of customer?
Which patients have the greatest likelihood of being hospitalized within the next year?
Which securities are the most profitable to buy/sell in a particular environment?
Data Mining Visualization
Focus is on discovery and analysis, rather than reporting, monitoring or communicating a message.
Uses primarily graphical output to display the patterns.
Included as part of the data mining tool.
Can also incorporate the results in standardized reporting tools and/or dashboards, but information is already “discovered” by that time.
How many people between the ages of 15-30 are diagnosed with type 2 diabetes?
What is the quantity breakdown by county in the U.S. for people diagnosed with type 2 diabetes?
What is the relationship between weight, exercise, age smoking, and the prevalence of type 2 diabetes?
What demographic factors are related to type 2 diabetes?
Is it OLAP or Data Mining??
15
How many different customers did we serve? How many applicants did we place?
Which customer was our most profitable? Which customers have the greatest likelihood of
increasing their number of temporary employees next year?
Which geographic region was our most profitable last quarter?
Which geographic region has the fastest growth rate measured by number of employees placed over the last 3 years?
Is it OLAP or data mining?? (TEC)
16
Most common visualization method for OLAP.
Visual display – not printed. Must have metrics. What is a metric,
again?? Key Information
◦ Most important information to monitor one or more objectives
◦ Usually related directly to key performance indicators
◦ Consolidated Fits on one screen (no scrolling!) Designed to be monitored at a glance
Dashboards
http://www.infosol.com/business%20intelligence/library-dashboards.aspx
http://www.dundas.com/dashboard/online-examples/
http://www.tableausoftware.com/ http://
www.exceluser.com/dash/samples.htm http://dashboardsbyexample.com/ http://www.dashboardzone.com/
Dashboard examples galore!
http://www.it-performs.com/services/dashboard-centre/dashboard-videos
http://www.youtube.com/watch?v=3Stuh7-RyuE http://www.youtube.com/watch?v=EJ9CNhgh8EY http://www.dminebi.com/dmine-dashboard-videos/ http://
www.youtube.com/watch?v=V9GMCS-WjyI&feature=related
http://www.youtube.com/watch?v=0AS9TIK1QFk&feature=related
Dashboard videos abound! (mostly from vendors of dashboard products...)
19
Derived from the work on executive information systems (late 1980’s through 1990’s).
Further roots in the work on the “balanced scorecard” concept to broaden perspective from financials alone.
Uses the dashboard metaphor to develop fast recognition and appeal.
Dashboards are not new...
Always need to know the goal
Strategic Analytical Operational
Audience Executives, managers
Managers, analysts Executives, managers, BOG
Use High-level performance;Relationships
Detailed understanding of KPI factors
Run daily, weekly, monthly operations
Design Simple displays;Provide context; Include forecasts
Rich comparisons; more context, multivariate
Maintain awareness through dynamic, simple displays
Issues andCautions
Beware too much information;Avoid subtle gradations;Link to KPI;Don’t bother with real-time data
Provide drill-down;Enable exploration; Show movement; Allow examination of causes;Probably doesn’t require real-time data
Specific information available; provide drill-down;Exceptions are critical; requires real-time data; use hovering
Typical dashboard data
Category
Measures
Sales BillingsBookings# of OrdersOrder Amounts
Category Measures
Marketing Market shareAd campaign $Cust. Demographics
Category
Measures
IT Network downtimeSystem usageFixed app defects
Category Measures
Tech Support
# of support callsResolved casesCustomer satisfactionCall duration
Category
Measures
Finance RevenuesExpensesProfits
Category
Measures
Human Resources
Employee satisfactionEmployee turnoverCount of open positionsCount of late reviews
Overall design◦ Exceeding boundaries of a single screen.◦ Limiting design to the dashboard metaphor.◦ Choosing ineffective or inappropriate visualization
methods.◦ Poor flow/arrangement of presentation of data.
Content◦ Choosing a deficient, inappropriate or ineffective measure.◦ Supplying inadequate context for the data.◦ Displaying excessive detail or precision.
Detailed design (look and feel)◦ Misusing or overusing color; meaningless variety of color
and shape.◦ Poor highlighting of important data.◦ Cluttering the display with useless decoration.
Common mistakes
23
Delivers information that is:◦ Exceptionally well-organized.◦ Condensed.◦ Provides summaries and exceptions.◦ Specific to the requirements of the audience.◦ Presented on the media of choice for the
audience (computer, phone, tablet, etc.)◦ Flexible.◦ Able to be pursued in more detail beyond the
dashboard.
Well-designed dashboard
24
Understand and make best use of screen real estate
Maximize the data-ink /total-ink ratio (or data pixels/total pixels ratio...)
Eliminate all unnecessary non-data pixels De-emphasize all non-data pixels and make
them slip into the background of the overall design
Highlight the most important data pixels
Key Goals (Tufte, 1980’s, Few, 2010’s)
Emphasized Neither emphasized or de-
emphasized
Neither emphasized or de-emphasized De-emphasized
Emphasized
Maximize data pixels/total pixels ratio
Salesperson Jan Feb Mar
Bill Bassett 2,834 4,340 4,885
Jenny Martin 5,890 7,439 6,493
Luis Marquez 3,899 6,889 8,593
Bob Taylor 1,250 3,445 5,443
Salesperson Jan Feb Mar
Bill Bassett 2,834 4,340 4,885
Jenny Martin 5,890 7,439 6,493
Luis Marquez 3,899 6,889 8,593
Bob Taylor 1,250 3,445 5,443
27
Janu
ary
Febr
uary
March
April
MayJu
ne July
Augus
t
Sept
embe
r
Octob
er
Novem
ber
Decem
ber
0
2000
4000
6000
8000
10000
12000
Store 1 Store 44 Store 8 Store 6
Janu
ary
March
MayJu
ly
Sept
embe
r
Novem
ber
0
2000
4000
6000
8000
10000
12000
Store 1
Store 8
Store 1Store 44Store 8Store 6
28
Grid lines in graphs that don’t need precision
Backgrounds that don’t provide delineation of sections on the dashboard
3-D that doesn’t provide additional variables or layers of analysis
Drawings that are not part of the data – including detailed logos
Colors that don’t highlight or emphasize data
Meters and gauges that don’t incorporate preattention
Junk pixels
Arrange the overall design to reflect how the intended audience “thinks” about the decisions to be made.
Group related data. Arrange the data in a meaningful order (low
to high; high to low) Use bright colors sparingly and judiciously. Avoid use of a colored background. White space is an effective delimiter. Use fonts with good legibility and readability.
Good design
30
Also graphical, but designed for an analyst to discover patterns, not to communicate information for managerial decision making.
Must understand a bit more about data mining while discussing visualization.
So, what about data mining visualization?
Opening Vignette:Data Mining Goes to Hollywood!
Independent Variable Number of Values
Possible Values
MPAA Rating 5 G, PG, PG-13, R, NR
Competition 3 High, Medium, Low
Star value 3 High, Medium, Low
Genre 10
Sci-Fi, Historic Epic Drama, Modern Drama, Politically Related, Thriller, Horror, Comedy, Cartoon, Action, Documentary
Special effects 3 High, Medium, Low
Sequel 1 Yes, No
Number of screens 1 Positive integer
Class No. 1 2 3 4 5 6 7 8 9
Range
(in $Millions)
< 1
(Flop)
> 1
< 10
> 10
< 20
> 20
< 40
> 40
< 65
> 65
< 100
> 100
< 150
> 150
< 200
> 200
(Blockbuster)
Dependent
Variable Independent
Variables
A Typical Classification
Problem
ModelDevelopmentprocess
ModelAssessmentprocess
The DM Process Map in IBM SPSS Modeler
Prediction Models
Individual Models Ensemble Models
Performance Measure SVM ANN C&RT
Random Forest
Boosted Tree
Fusion (Average)
Count (Bingo) 192 182 140 189 187 194
Count (1-Away) 104 120 126 121 104 120
Accuracy (% Bingo) 55.49% 52.60% 40.46% 54.62% 54.05% 56.07%
Accuracy (% 1-Away) 85.55% 87.28% 76.88% 89.60% 84.10% 90.75%
Standard deviation 0.93 0.87 1.05 0.76 0.84 0.63
* Training set: 1998 – 2005 movies; Test set: 2006 movies