Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples...

Visualizing and Exploring Data

Outline1.Introduction2.Summarizing Data: Some Simple Examples3.Tools for Displaying Single Variable4.Tools for Displaying Relationships between Two

Variables5.Tools for Displaying More Than Two Variables6.Principal Components Analysis7.Multidimensional Scaling

Introduction

• Visual methods are important and ideal for sifting through data to find unexpected relationships.

• Exploratory data analysis is to find the structure that may indicate deeper relationships between cases or variables.

Summarizing Data: Some Simple Examples

The measure of locationMeanMedianFirst quartileThird quartileDecilesPercentilesMode

Summarizing Data: Some Simple Examples(Cont.)

Suppose that x(1),x(2),…..x(n) comprise a set of n data value.

• Sample mean

μ: true mean of population : estimate of true mean

Sample mean can minimize the sum of squared difference between it and the data values.

Ex. data set{1,2,3,4,5}μ =3

• Median: The value that has equal number of data points above and below it.

Ex.data set{1,2,3,4,5}Median=3Ex.data set{1,2,3,4,5,6}Median=(3+4)/2=3.5

• First quartile: The value that is greater than a quarter of data points.

• Third quartile: The value that is greater than three quarters of data points.

• Interquartile range: The difference between the third and first quartile.

• Range: The difference between the largest and smallest data point.

Percentiles: The value of a variable below which a certain percent of observations fall.

Deciles

• Mode: The value that occurs most frequently in a data set or a probability distribution

Ex.data set{1,3,6,6,6,6,7,7,12,12,17}Mode=6Ex.data set{1,1,2,4,4}Mode=1,4

• Unimodal: A data set or a distribution with one mode

• Bimodal• Multimodal

• Variance

If μ is replaced with then the variance is estimated as

• Standard deviation

• Skewness: It measures whether or not a distribution has a single long tail.

• A distribution is said to be right-skewed if the long tail extends in the direction of increasing values and left-skewed otherwise. Symmetric distribution have zero skewness.

Tools for Displaying Single Variable

• Histogram-1

Tools for Displaying Single Variable(Cont.)

• Histogram-2

• Kernel estimateA single variable X Have measured values

{x(1),x(2),……x(n)}

K():Kernel function, Gaussian curve in commonh: Width

• Gaussian curve

C: Normalization constantt=x-x(i)h:standard deviation

• Box and whisker plot

Tools for Displaying Relationships between Two Variables

• Scatterplot

Tools for Displaying Relationships between Two Variables(Cont.)

• Contour plot

Tools for Displaying More Than Two Variables

• Scatterplot matrix

Tools for Displaying More Than Two Variables(Cont.)

• Trellis plot

• Star plot

• Chernoff’s face

• Parallel coordinates plot

Principal Components Analysis

• Objective: To find vectors let data project on them to keep maximum variance.

• Advantage: This method can reduce the dimensions of data.

Principal Components Analysis(Cont.)

• Suppose an n×p data matrix X that each row is a data vector x and columns represent the variables.

• X is mean-centered (i.e column has subtracted the sample mean for that variable )

• a p×1 column vector a of projection weights and let the data vector x project along a represent that .

• All data vectors in X are projected on a represent that Xa is an n×1column vector of projected values.

• Define the variance along a as

• : The p×p covariance matrix of the data

• Using some constraint such that and use Lagrange multiplier to find a that maximize the variance along a.

• Differentiating with respect to a yields

)1( aaVaa TTu

• The first principal component a is the eigenvector associated with the largest eigenvalue of the covariance matrix V

• The second principal component is associated with the second largest eigenvalue and it’s direction orthogonal to the first , and so on.

• The data are projected into first k eigenvectors the variance of the projected data can be expressed as

• : The jth eigenvalue

• The loss of data

• Scree plot

• Ex.269.8 38.9 50.5

272.4 39.5 50.0

272.0 39.3 50.2

268.2 38.6 50.2

268.2 38.6 50.8

267.0 38.2 51.1

267.8 38.4 51.0

273.6 39.6 50.0

271.2 39.1 50.4

270.0 38.9 50.5

Multidimensional Scaling

• Objective: To seek to represent data points in lower dimensional space while preserving ,as far as is possible, the distances between the data points.

Multidimensional Scaling(Cont.)

• Classical multidimensional scaling• Metric multidimensional scaling• Non-metric multidimensional scaling

• Assume an 3×2 data matrix X that the mean of each variable is zero.

• Then compute an 3×3 matrix B that

333231

232221

131211

2312232213122321131

32223121222

22112221122

3212311122122111212

xxxxxxxxxx

xxxxxxxxxxTXXB

ijij bb 0

• The squared Euclidean distance between object1 and 2 that

)1.....(....................2

122211

22122111222

2222212

2212111

ijjjiiijijjjiiij

dbbbbbbd

xxxxxxxx

xxxxxxxxd

• Define an 3×3 distance matrix D that

322233311133

233322211122

133311122211

bbbbbb

)4....(......................................................................).........(2

)3........(......................................................................)(

)2.......(......................................................................)(

11332211

311133211122

bbbbbb

)9....(..................................................21

)8....(..................................................21

thenEq(6)andEq(5)into )(fordsubstitute is Eq(7)

)7......(........................................2

1)()4(

)6...(........................................

)5...(........................................)(

trdbEq

Eq(1) into andfor dsubstitute are Eq(9) and Eq(8)

• Using Singular Value Decomposition to B that

of eigenvalue is diagonalon element each matrix, diagonal:

1],......[

of rseigenvecto are torscolumn vec alland

, meansit matrix, lorthonorma:

vvvvvV

IVVVVV

• We can choose first r eigenvalues more large than others that decide to how many dimensions we want to map.

matrix:

• Ex.• Data eigenvalues distance

• Transformed data stress distance

16.9641

7.7025

-2.4621 1.5436

-0.7528 -2.2085

3.2149 0.6649

0 4.1231 5.7446

4.1231 0 4.8990

5.7446 4.8990 0

0 4.1231 5.7446

4.1231 0 4.8990

5.7446 4.8990 0

1.0325e-016

• Stress

: The observed distance between point i and j in the p-dimensional space.

: The distance between points representing these objects in the two-dimensional space.

• Sstress

ijij dd 22/)(

ijij dd 4222 /)(

Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples...

Documents

Transcript of Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples...

MODULE Analyzing, and Summarizing Data - Lumberton · PDF filedisplaying, analyzing, and summarizing data? Displaying, Analyzing, and Summarizing Data Get immediate ... Layered Bookore

CorrectionKey=A DO NOT EDIT--Changes must be Real-World Video my.hrw.com? ESSENTIAL QUESTION my.hrw.com How can you solve real-world problems by displaying, analyzing, and summarizing

Chapter 3: Displaying and summarizing quantitative data ...utstat.utoronto.ca/~mahinda/stab22/wk2.pdf · Chapter 3: Displaying and summarizing quantitative data p52 ... Study collected

Displaying and Summarizing Quantitative Data · Displaying and Summarizing Quantitative Data WHO 1240 earthquakes known to have caused tsunamis for which we have data or good estimates

Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing and Displaying Measurement Data Chapter 7.

Chapter 4: Displaying & Summarizing Quantitative Data

11 Chapter 3: Displaying Query Results 3.1 Presenting Data 3.2 Summarizing Data.

FDA_PhUSE White Paper - Central Tendency · Web viewThe purpose of this white paper is to provide advice on displaying, summarizing, and/or analyzing measures of central tendency,

Branding Strategies Design and Implementation. Brand hierarchy The means of summarizing the branding strategy by displaying the number and nature of common.

. Chapter 4 Displaying Quantitative Data. . Slide 4- 2 Dealing With a Lot of Numbers… Summarizing the data will help us when we look at large sets of.

Data Analysis Statistics is the science of data. – Data Analysis is the process of organizing, displaying, summarizing, and asking questions about data.

Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Copyright © 2010 Pearson Education, Inc. Chapter 4 Displaying and Summarizing Quantitative Data.

Displaying & Summarizing Quantitative Data

Introduction. What is/are Statistics? Tools for organizing and summarizing data Tests and estimates for generalizations.

Chapter 4 Displaying and Summarizing Quantitative Data Display: Histograms, Stem and Leaf Plots Numerical Summaries: Median, Mean, Quartiles, Standard.

Data Collection Analysis of Variance Displaying and Summarizing … · 2018. 1. 4. · −1dfin numerator and − dfin denominator Example #2: ANOVA •Scenario:Compare average math

Tech!Tools!for!Students!with!DYSLEXIA! Tools for Students with... · 3 ! •! Comprehension !hastodowithidentifying key elements in a story, summarizing, predicting, questioning,

Summarizing tips

2 Summarizing and Displaying Dataschrader/DataAnalysisI/f3.pdf2 SUMMARIZING AND DISPLAYING DATA The only remaining character plot obtained by default is the stem and leaf display.