Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine...
Transcript of Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine...
![Page 1: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/1.jpg)
Data Analysis, Statistics, Machine Learning
Leland Wilkinson Adjunct Professor UIC Computer Science Chief Scien<st H2O.ai [email protected]
![Page 2: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/2.jpg)
2
Anomalies o Anomalies are, literally, lack of a law (nomos) o The best-‐known anomaly is an outlier
o This presumes a distribu<on with tail(s) o All outliers are anomalies, but not all anomalies are outliers o Iden<fying outliers is not simple
o Almost every soLware system and sta<s<cs text gets it wrong
o Other anomalies don’t involve distribu<ons o Coding errors in data o Misspellings o Singular events
o OLen anomalies in residuals are more interes<ng than the es<mated values
Copyright © 2016 Leland Wilkinson
![Page 3: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/3.jpg)
3
Anomalies o Why do we care?
o Anomalies may bias sta<s<cal es<mates o And then again, they might not o You need to worry about influence, not outliers
Copyright © 2016 Leland Wilkinson
![Page 4: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/4.jpg)
4
Anomalies o Why do we care?
o Anomalies may bias sta<s<cal es<mates o Do NOT drop outliers from a dataset before fi\ng o Unless you know why they are outliers o There are alterna<ves – robust methods, Winsorizing, trimming, …
o Anomalies may lead to new research ideas o Give a group of people a baaery of psychological tests o Interview outliers personally
o Anomalies may be the needle in the haystack o Terrorists are rare, extreme o There may not be enough of them to model their behavior adequately o Search for anomalies in the general popula<on
o Anomalies may lead you to a beaer model o You can’t have an anomaly without a model o Examining anomalies in residuals can help you to modify the model
Copyright © 2016 Leland Wilkinson
![Page 5: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/5.jpg)
5
Anomalies o Outliers
o “An outlier is an observa<on which deviates so much from the other observa<ons as to arouse suspicions that it was generated by a different mechanism” (Hawkins, 1980)
o The exis<ng methods in sta<s<cs and machine learning packages for detec<ng outliers based on the mean and standard devia<on of a distribu<on are wrong
o That is because, as n increases, cri<cal value of alpha must change in order to prevent false posi<ves
o But picking alpha for a given n makes detec<on of outliers circular o Mul<variate outlier detec<on problem is even harder
o Curse of dimensionality means interpoint distances tend toward a constant as n held constant and p heads toward infinity
o Graphical methods aren’t much beaer
Copyright © 2016 Leland Wilkinson
![Page 6: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/6.jpg)
6
Anomalies o Outliers
o Don’t bother to Google o You’ll get this…
Copyright © 2016 Leland Wilkinson
![Page 7: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/7.jpg)
7
Anomalies o Outliers
o What you will find if you persist o There are two popular tests o Both depend on a normal distribu<on o Both fail to offer protec<on for large samples
Grubbs (1950) Tukey (1977)
Copyright © 2016 Leland Wilkinson
![Page 8: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/8.jpg)
8
Anomalies Outliers
Why distance from loca<on (mean, median, …) is wrong o Remember Hawkins’ defini<on
o “…arouse suspicions that it was generated by a different mechanism”
Wouldn’t you be inclined to say the one on the leL is an outlier but not the right? The two samples have the same mean and standard devia<on. So, the problem boils down to gaps, not distance from center
-3 -2 -1 0 1 2 3X
-3 -2 -1 0 1 2 3Y
Q = gap / range
Dixon (1951) Tukey-‐Wainer-‐Schacht (1978)
zi =√wigi
−midmean(y) , where
wi = i(n− i)
Copyright © 2016 Leland Wilkinson
![Page 9: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/9.jpg)
9
Anomalies o Outliers
o Graphical methods o Box plots depend on normal distribu<on – useless for large n o See how many box plot outliers there are for n = 100,000? o Leaer value box plots (Hofmann, Kafadar, Wickham, 2006) beaer
n = 100,000
n = 100,000
n = 100
n = 100
Copyright © 2016 Leland Wilkinson
![Page 10: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/10.jpg)
10
Anomalies o Outliers
o Graphical methods o Probability plot is one of the best, IF you know the distribu<on
Copyright © 2016 Leland Wilkinson
![Page 11: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/11.jpg)
11
Anomalies o Outliers
o Transforma<ons affect outlier detec<on o For skewed batches, need to transform before tes<ng for outliers
Copyright © 2016 Leland Wilkinson
![Page 12: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/12.jpg)
12
Anomalies o Skewness and Kurtosis
o Use L-‐moments (based on weighted sums) o More robust (no third or fourth powers)
o Spikes o Use dot plots o Check for stacks o Signal for Zero Inflated Poisson (ZIP) or other models
o Mul<modality o Smooth with a kernel o Do bump hun<ng by compu<ng slope of tangent o Look for more than one bump (mode)
Copyright © 2016 Leland Wilkinson
![Page 13: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/13.jpg)
13
Anomalies o Mul<variate Outliers
o Mahalanobis Distance is most popular method o OK if you know distribu<on is mul<variate normal o But es<mate of covariance matrix can be unreliable when p is large
o If so, try compu<ng robust covariances for Mahalanobis Distance
Copyright © 2016 Leland Wilkinson
![Page 14: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/14.jpg)
14
Anomalies o Mul<variate Outliers
o Principal Components o Plot last few PC’s against each other
o As with Mahalanobis Distance, may want to base them on robust covariances
Copyright © 2016 Leland Wilkinson
![Page 15: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/15.jpg)
15
Anomalies o Mul<variate Outliers
o Minimum Spanning Tree o Compute MST and look for nodes having extremely long edges
0 1 2 3 4 5 6 7 8 9 10PUTOUTRATE
0
1
2
3
4ASSISTRATE
Copyright © 2016 Leland Wilkinson
![Page 16: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/16.jpg)
16
Anomalies o Mul<variate Outliers
o Clustering
o On each itera<on, use outlier algorithm to decide if a distance to a centroid is beyond cutoff
o If so, leave point out of centroid o Omiaed points are outliers
1. Choose very large k 2. Ini<alize k centroids 3. Assign every point y to nearest centroid (squared Euclidean distance) 4. Compute within-‐cluster sum of squares (SSW) 5. Repeat 3 and 4 un<l SSW does not get no<ceably smaller
Copyright © 2016 Leland Wilkinson
![Page 17: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/17.jpg)
17
Anomalies o Mul<variate Outliers
o Clustering o Didn’t work too well here
Copyright © 2016 Leland Wilkinson
![Page 18: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/18.jpg)
18
Anomalies o Mul<variate Outliers
o Stahel-‐Donoho outlyingness o A robust method with high breakdown point
o For any real valued vector yp x 1, the measure of outlyingness is
o The es<mate for µ is based on the a weighted loca<on es<mator o The es<mate for σ is based on the median absolute devia<on (MAD)
o The Stahel–Donoho es<mator is defined as a weighted mean and covariance, where each observa<on receives a weight which depends on a measure of its outlyingness. This measure is based on the one-‐dimensional projec<on in which the observa<on is most outlying. The mo<va<on is that every mul<variate outlier must be a univariate outlier in some projec<on. o Compu<ng this is expensive, although one can use sampling to find a
r(y,X) = supa∈Sp
|a�y − µ(a�X�)|σ(a�X�)
Sp = {a ∈ Rp : ||a|| = 1}
Copyright © 2016 Leland Wilkinson
![Page 19: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/19.jpg)
19
Anomalies o Mul<variate Anomalies
o Scagnos<cs (Wilkinson, Anand, Grossman, 2005)
o We characterize a scaaerplot (2D point set) with nine measures o We base our measures on three geometric graphs. Convex Hull Minimum Spanning Tree Alpha Shape
Copyright © 2016 Leland Wilkinson
![Page 20: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/20.jpg)
20
Anomalies o Mul<variate Anomalies
o Scagnos<cs
Copyright © 2016 Leland Wilkinson
![Page 21: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/21.jpg)
21
Anomalies o Mul<variate Anomalies
o Scagnos<cs
Copyright © 2016 Leland Wilkinson
![Page 22: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/22.jpg)
22
Anomalies o Mul<variate Anomalies
o Scagnos<cs
Copyright © 2016 Leland Wilkinson
![Page 23: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/23.jpg)
23
Anomalies o Mul<variate Anomalies
o Scagnos<cs o Here’s how they distribute in 2D
Copyright © 2016 Leland Wilkinson
![Page 24: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/24.jpg)
24
Anomalies o Mul<variate Anomalies
o Scagnos<cs
Copyright © 2016 Leland Wilkinson
![Page 25: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/25.jpg)
25
Anomalies o Mul<variate Anomalies
o Detec<ng outlying scaaerplots by cluster analyzing scagnos<cs matrix o Compute scagnos<cs matrix and then cluster it o Use cluster outlier method to detect outlying scaaerplots
o No<ce the plot in the upper leL is an outlier even though it looks bivariate normal
Copyright © 2016 Leland Wilkinson
![Page 26: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/26.jpg)
26
Anomalies o Mul<variate Anomalies
o Scagnos<cs o Ladder of powers transforma<ons reveal different scagnos<cs under o different transforma<ons (Dang & Wilkinson, 2014)
Copyright © 2016 Leland Wilkinson
![Page 27: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/27.jpg)
27
Anomalies o Mul<variate Outliers
o Outliers in tables o Fit a Poisson (log-‐linear) model and look at residuals
Bogumił Kamiński, Visualizing tables in ggplot2
Copyright © 2016 Leland Wilkinson
![Page 28: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/28.jpg)
28
Anomalies o Mul<variate Outliers
o Outliers in tables o Simple chi-‐square can be used on a two-‐way table
Copyright © 2016 Leland Wilkinson
![Page 29: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/29.jpg)
29
Anomalies o Inliers
o Histograms hide details o Stem-‐and-‐leaf and dot plots do not
o In this batch, someone rounded some heights of baseball players to nearest inch
Copyright © 2016 Leland Wilkinson
![Page 30: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/30.jpg)
30
Anomalies o Inliers
o Detec<ng duplicates o Pick delta profile distance (Euclidean or other distance metric) o Set delta to zero if you want to detect only exact duplicates o Mul<variate sort and flag cases closer than delta o Duplicate cases found in some Iris datasets with this method
Copyright © 2016 Leland Wilkinson
![Page 31: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/31.jpg)
31
Anomalies o Missing Values
o A missing value is a value that is not observed o Rubin (1976) gave missing values a theore<cal basis
o Iden<fying a missing value implies we could measure it under some circumstances
o Missing value categories o NULL – undefined value (not missing) o Failure to respond (usually, but not always, missing) o Refusal to respond (rarely, but some<mes, missing) o Some other random coding omission
o Rubin missing value classes o Rela<on between a variable and probability of a value being missing
o Missing Completely At Random (MCAR) o Missing At Random (MAR) o Missing Not At Random (MNAR)
o Values must be MAR or MCAR to use Rubin’s Mul<ple Imputa<on
Copyright © 2016 Leland Wilkinson
![Page 32: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/32.jpg)
32
Anomalies o Missing Values
o Single imputa<on (all these methods are invalid) o Hot deck
o randomly select a similar record for imputed value o reduces uncertainty of es<mates
o Mean imputa<on o replace missing value with mean of variable o aaenuates covariance/correla<on es<mates
o Listwise dele<on (standard method in most sta<s<cs packages) o throw out record with any missing values o reduces power and can introduce bias
o Pairwise dele<on o when compu<ng correla<ons, ignore any case with missing value on either variable o can induce nega<ve eigenvalues and correla<ons greater than 1 in absolute value
o Regression imputa<on o fit regression equa<on using non-‐missing cases to predict missing values o reduces uncertainty of es<mates
Copyright © 2016 Leland Wilkinson
![Page 33: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/33.jpg)
33
Anomalies o Missing Values
o Mul<ple imputa<on o 1. Impute missing values using linear or logis<c regression o 2. Do this, say, 10 <mes. o 3. Perform the desired analysis on each imputed dataset o 4. Average the values of the parameter es<mates across the imputed datasets o 5. Calculate standard errors of parameters using a formula given by Rubin
o The EM Algorithm (for accomplishing step 1 above) o 1. Es<mate regression coefficients for each missing value o 2. Plug es<mates into the missing cells o 3. Compute covariance matrix on complete data o 4. Repeat 1 through 3 un<l covariance matrix stabilizes o Usually only a few itera<ons are necessary o Perturb the regression coefficients by a small amount before impu<ng
Copyright © 2016 Leland Wilkinson
![Page 34: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/34.jpg)
34
Anomalies o Missing Values
o Mul<ple imputa<on o Can delete up to 50% of values and s<ll get decent es<mates
Missing Data Complete Data
n = 57
Copyright © 2016 Leland Wilkinson
![Page 35: Data Analysis, Statistics, Machine Learningwilkinson... · Data Analysis, Statistics, Machine Learning Leland’Wilkinson’ ’ AdjunctProfessor’ ’’’’’’’’’’’UIC’Computer’Science’](https://reader036.fdocuments.net/reader036/viewer/2022062917/5ecdd0045d3564758e1d055d/html5/thumbnails/35.jpg)
35
Anomalies o References
o Hawkins, D. (1980). Identification of Outliers. New York: Chapman and Hall. o Rousseeuw, P.J., and Leroy, A.M. (1987). Robust Regression and Outlier Detection. New
York: John Wiley & Sons. o Rubin, D.B. (1987) Multiple Imputation for Nonresponse in Surveys. New York: John
Wiley & Sons.
Copyright © 2016 Leland Wilkinson