A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer...

16
A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department

Transcript of A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer...

Page 1: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

A computational tool fordepth-based Statistical analysis

Eynat Rafalin, Tufts UniversityComputer Science

Department

Page 2: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

The tool Easy to use, efficient and

expandable interface, for statistical research, based on the notion of data depth.

For scientists with no computer science background.

Page 3: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Our goal Present the tool to the community

Code\software available on request Run on real data Get feedback

Is such a tool needed? Additions\improvements?

Page 4: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

General C++ based software (no additional tools\

software needed) Simple interface. Should allow to

enter data files, sort the data points and filter unwanted data

perform calculations present the results in an easy to understand

graphical interface Save and output data for future use

Fast Portable code

Page 5: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

General descriptionData filter

Contours display and selection

Statistical modules

output

txt, excel files

Geomview

Page 6: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Data filter Graphical user interface developed in C+

+ Used to crop\manipulate a data set

before it is fed into the statistical modules

Fast and light Convenient and easy to use user

interface Portable code (UNIX, Solaris, Linux, Win)

Page 7: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Data filter

Page 8: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Statistical modules

Depth contours (2D) Half-space (location) depth contours

optimal O(n2) time Supports two approaches for defining contours Including Tukey median and the bagplot Including contours’ parameters (size, etc..)

Convex hull peeling depth contours Simplicial depth contours Tukey median computation (O(nlog3n)) Locating a new point in a set of depth

contours (O(log n) query time)

Page 9: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Approaches for defining depth contours P. Rousseeuw et al.

The k-th depth contour is the boundary of the set of points in the plane with depth k

R. Liu et al. (based on order statistics) The sample p-th central hull is the

convex hull containing the most central fraction p sample points.

Page 10: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Half-space (location) depth contours module

Depth contours for a sample set with 8 data points

Depth contours for a data set describing diabetic patients

Page 11: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Statistical modules – cntd.Plots DD (Depth vs. Depth) plots

O(n2) time Shrinkage plots Fan plots

Page 12: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

DD (Depth vs. Depth) plots module

Two 2D data sets of 50 points each, created from normal distribution, centered at (0,0), with different covariance matrices (1 and 4 id).

Depth

acc

ord

ing t

o s

et

A

Depth according to set B

Page 13: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Fan plots

50 data points, created from a random distribution, with covariance matrix 4 times identity.The fans are created for data sets containing the 1/6, 2/6, ..central regions. For each region the area of the CH of 2, 4, 6,…% of the points is computed.

Rela

tive a

rea (

CH

of

p%

/CH

)

Percentile of points

Page 14: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Graphical contour selection tool

Plots depth contours and selects data ranges.

Actions Import\export Select points Depth slider Filter

Page 15: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Future work Run the tool on existing data sets Distribute preliminary versions and get

users feedback Data filter

Group by row\column Filter by row\column Interactions between rows\columns (addition,

substitution, logical operations) Statistical modules

Implement additional modules Improve running times

Page 16: A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Contributors Prof. Diane Souvaine Prof. Alva Couch Eynat Rafalin Michael Burr Joe Handelman James Hayes

Ori Taka Alok Lal Janet Luan Kim Miller Tim Mitchell Nikolai Shvertner