Computational Stadistic with Matlab

11
Computer Science and Data Analysis Series Computational Statistics Handbook with MATLAB® Second Edition Wendy L. Martinez The Office of Naval Research Arlington, Virginia, U.S.A. Angel R. Martinez Naval Surface Warfare Center Dahlgren, Virginia, U.S.A. Chapman &. Hall/CRC Taylor & Francis Group Boca Raton London New York «H Chapman & Hall/CRC is an imprint of the Taylor & Francis Group, an informa business

description

Libro practico para el desarrollo y analisis estadisticos utilizando el programa Matlab, muy recomendado para estudiantes y profesionales de ingenieria y economia y ciencias afines.

Transcript of Computational Stadistic with Matlab

Page 1: Computational Stadistic with Matlab

Computer Science and Data Analysis Series

Computational Statistics Handbook

with MATLAB® Second Edition

Wendy L. Martinez The Office of Naval Research

Arlington, Virginia, U.S.A.

Angel R. Martinez Naval Surface Warfare Center

Dahlgren, Virginia, U.S.A.

Chapman &. Hall/CRC Taylor & Francis Group

Boca Raton London New York

«H

Chapman & Hall/CRC is an imprint of the Taylor & Francis Group, an informa business

Page 2: Computational Stadistic with Matlab

Table ofContents

Preface to the Second Edition xvii Preface to the First Edition xxi

Chapter 1 Introduction 1.1 What Is Computational Statistics? 1 1.2 An Overview of the Book 2

Philosophy 2 What Is Covered 3 A Word About Notation 5

1.3 MATLAB® Code 6 Computational Statistics Toolbox 7 Internet Resources 8

1.4 Further Reading 9

Chapter 2 Probability Concepts 2.1 Introduction 11 2.2 Probability 12

Background 12 Probability 14 Axioms of Probability 17

2.3 Conditional Probability and Independence 17 Conditional Probability 17 Independence 18 Bayes' Theorem 19

2.4 Expectation 21 Mean and Variance 21 Skewness 23 Kurtosis 23

2.5 Common Distributions 24 Binomial 24 Poisson 26 Uniform 29 Normal 31

vii

Page 3: Computational Stadistic with Matlab

viii Computational Statistics Handbook with MATLAB®, 2ND Edition

Exponential 34 Gamma 36 Chi-Square 37 Weibull 38 Beta 40 Student's t Distribution 41 Multivariate Normal 44 Multivariate t Distribution 47

2.6 MATLAB® Code 48 2.7 Further Reading 49 Exercises 52

Chapter 3 Sampling Concepts 3.1 Introduction 55 3.2 Sampling Terminology and Concepts 55

Sample Mean and Sample Variance 57 Sample Moments 58 Covariance 60

3.3 Sampling Distributions 63 3.4 Parameter Estimation 65

Bias 66 MeanSquared Error 66 Relative Efficiency 67 Standard Error 67 Maximum Likelihood Estimation 68 Method of Moments 71

3.5 Empirical Distribution Function 72 Quantiles 74

3.6 MATLAB® Code 77 3.7 Further Reading 78 Exercises 80

Chapter 4 Generating Random Variables 4.1 Introduction 83 4.2 General Techniques for Generating Random Variables 83

Uniform Random Numbers 83 Inverse Transform Method 86 Acceptance-Rejection Method 89

4.3 Generating Continuous Random Variables 93 Normal Distribution 93 Exponential Distribution 94 Gamma 95

Page 4: Computational Stadistic with Matlab

Table ofContents ix

Chi-Square 98 Beta 99 Multivariate Normal 101 Multivariate Student's t Distribution 103 Generating Variates on a Sphere 104

4.4 Generating Discrete Random Variables 107 Binomial 107 Poisson 108 Discrete Uniform 111

4.5 MATLAB® Code 112 4.6 Further Reading 113 Exercises 115

Chapter 5 Exploratory Data Analysis 5.1 Introduction 117 5.2 Exploring Univariate Data 119

Histograms 119 Stem-and-Leaf 122 Quantile-Based Plots - Continuous Distributions 124 Quantile Plots - Discrete Distributions 132 Box Plots 138

5.3 Exploring Bivariate and Trivariate Data 145 Scatterplots 145 Surface Plots 146 Contour Plots 148 Bivariate Histogram 149 3-D Scatterplot 155

5.4 Exploring Multi-Dimensional Data 158 Scatterplot Matrix 158 Slices and Isosurfaces 160 Glyphs 166 Andrews Curves 168 Parallel Coordinates 172

5.5 MATLAB® Code 179 5.6 Further Reading 181 Exercises 183

Chapter 6 Finding Structure 6.1 Introduction 187 6.2 Projecting Data 188 6.3 Principal Component Analysis 190 6.4 Projection Pursuit EDA 195

Page 5: Computational Stadistic with Matlab

x Computational Statistics Handbook with MATLAB®, 2ND Edition

Projection Pursuit Index 197 Finding the Structure 198 Structure Removal 199

6.5 Independent Component Analysis 204 6.6 Grand Tour 211 6.7 Nonlinear Dimensionality Reduction 216

Multidimensional Scaling 216 Isometric Feature Mapping - ISOMAP 220

6.8 MATLAB® Code 224 6.9 Further Reading 227 Exercises 230

Chapter 7 Monte Carlo Methods for Inferential Statistics 7.1 Introduction 233 7.2 Classical Inferential Statistics 234

Hypothesis Testing 234 Confidence Intervals 243

7.3 Monte Carlo Methods for Inferential Statistics 246 Basic Monte Carlo Procedure 246 Monte Carlo Hypothesis Testing 247 Monte Carlo Assessment of Hypothesis Testing 252

7.4 Bootstrap Methods 256 General Bootstrap Methodology 256 Bootstrap Estimate of Standard Error 258 Bootstrap Estimate of Bias 260 Bootstrap Confidence Intervals 262

7.5 MATLAB® Code 268 7.6 Further Reading 269 Exercises 271

Chapter 8 Data Partitioning 8.1 Introduction 273 8.2 Cross-Validation 274 8.3Jackknife 281 8.4 Better Bootstrap Confidence Intervals 289 8.5 Jackknife-After-Bootstrap 293 8.6 MATLAB® Code 295 8.7 Further Reading 296 Exercises 298

Page 6: Computational Stadistic with Matlab

Table of Contents xi

Chapter 9 Probability Density Estimation 9.1 Introduction 301 9.2 Histograms 303

1-D Histograms 303 Multivariate Histograms 309 Frequency Polygons 311 Averaged Shifted Histograms 316

9.3 Kernel Density Estimation 322 Univariate Kernel Estimators 322 Multivariate Kernel Estimators 327

9.4 Finite Mixtures 329 Univariate Finite Mixtures 331 Visualizing Finite Mixtures 333 Multivariate Finite Mixtures 335 EM Algorithm for Estimating the Parameters 338 Adaptive Mixtures 343

9.5 Generating Random Variables 348 9.6 MATLAB® Code 356 9.7 Further Reading 357 Exercises 359

Chapter 10 Supervised Learning 10.1 Introduction 363 10.2 Bayes Decision Theory 365

Estimating Class-Conditional Probabilities: Parametric Method 367 Estimating Class-Conditional Probabilities: Nonparametric 369 Bayes Decision Rule 370 Likelihood Ratio Approach 377

10.3 Evaluating the Classifier 380 Independent Test Sample 380 Cross-Validation 382 Receiver Operating Characteristic (ROC) Curve 385

10.4 Classification Trees 390 Growing the Tree 394 Pruning the Tree 399 Choosing the Best Tree 403 Other Tree Methods 412

10.5 Combining Classifiers 414 Bagging 415 Boosting 417 Arcing Classifiers 420 Random Forests 422

10.6 MATLAB® Code 423

Page 7: Computational Stadistic with Matlab

xii Computational Statistics Handbook with MATLAB9, 2ND Edition

10.7 Further Reading 424 Exercises 428

Chapter 11 Unsupervised Learning 11.1 Introduction 431 11.2Measuresof Distance 432 11.3 Hierarchical Clustering 434 11.4 K-Means Clustering 442 11.5 Model-Based Clustering 445

Finite Mixture Models and the EM Algorithm 446 Model-Based Agglomerative Clustering 450 Bayesian Information Criterion 453 Model-Based Clustering Procedure 453

11.6 Assessing Cluster Results 458 Mojena - Upper Tail Rule 458 Silhouette Statistic 459 Other Methods for Evaluating Clusters 462

11.7 MATLAB® Code 465 11.8 Further Reading 466 Exercises 469

Chapter 12 Parametric Models 12.1 Introduction 471 12.2 Spline Regression Models 477 12.3 Logistic Regression 482

Creating the Model 482 Interpreting the Model Parameters 487

12.4 Generalized Linear Models 488 Exponential Family Form 489 Generalized Linear Model 494 Model Checking 498

12.5 MATLAB® Code 508 12.6 Further Reading 509 Exercises 511

Chapter 13 Nonparametric Models 13.1 Introduction 513 13.2 Some Smoothing Methods 514

Bin Smoothing 515 RunningMean 517

Page 8: Computational Stadistic with Matlab

Table ofContents xiii

Running Line 518 Local Polynomial Regression - Loess 519 Robust Loess 525

13.3 Kernel Methods 528 Nadaraya-Watson Estimator 531 Local Linear Kernel Estimator 532

13.4 Smoothing Splines 534 Natural Cubic Splines 536 Reinsch Method for Finding Smoothing Splines 537 Values for a Cubic Smoothing Spline 540 Weighted Smoothing Spline 540

13.5 Nonparametric Regression - Other Details 542 Choosing the Smoothing Parameter 542 Estimation of the Residual Variance 547 Variability of Smooths 548

13.6 Regression Trees 551 Growing a Regression Tree 553 Pruning a Regression Tree 557 Selecting a Tree 557

13.7 Additive Models 563 13.8 MATLAB® Code 567 13.9 Further Reading 570 Exercises 573

Chapter 14 Markov Chain Monte Carlo Methods 14.1 Introduction 575 14.2 Background 576

Bayesian Inference 576 Monte Carlo Integration 577 Markov Chains 579 Analyzing the Output 580

14.3 Metropolis-Hastings Algorithms 580 Metropolis-Hastings Sampler 581 Metropolis Sampler 584 Independence Sampler 587 Autoregressive Generating Density 589

14.4 The Gibbs Sampler 592 14.5 Convergence Monitoring 602

Gelman and Rubin Method 604 Raftery and Lewis Method 607

14.6 MATLAB® Code 609 14.7 Further Reading 610 Exercises 612

Page 9: Computational Stadistic with Matlab

xiv Computational Statistics Handbook with MATLAB®, 2ND Edition

Chapter 15 Spatial Statistics 15.1 Introduction 617

What Is Spatial Statistics? 617 Types of Spatial Data 618 Spatial Point Patterns 619 Complete Spatial Randomness 621

15.2 Visualizing Spatial Point Processes 623 15.3 Exploring First-order and Second-order Properties 627

Estimating the Intensity 627 Estimating the Spatial Dependence 630

15.4 Modeling Spatial Point Processes 638 Nearest Neighbor Distances 638 IC-Function 643

15.5 Simulating Spatial Point Processes 646 Homogeneous Poisson Process 647 Binomial Process 650 Poisson Cluster Process 651 Inhibition Process 654 Strauss Process 656

15.6 MATLAB® Code 658 15.7 Further Reading 659 Exercises 661

Appendix A Introduction to MATLAB® A.l What Is MATLAB®? 663 A.2 Getting Help in MATLAB® 664 A.3 File and Workspace Management 664 A.4 Punctuation in MATLAB® 666 A.5 Arithmetic Operators 666 A.6 Data Constructs in MATLAB® 668

Basic Data Constructs 668 Building Arrays 668 CellArrays 669

A.7 Script Files and Functions 670 A.8 Control Flow 672

For Loop 672 WhileLoop 672 If-Else Statements 673 Switch Statement 673

A.9 Simple Plotting 673 A.10 Contact Information 676

Page 10: Computational Stadistic with Matlab

Table ofContents xv

Appendix B Projection Pursuit Indexes B.l Indexes 677

Friedman-Tukey Index 677 Entropy Index 678 Moment Index 678 L2Distances 679

B.2 MATLAB® Source Code 680

Appendix C MATLAB® Statistics Toolbox File I /O 687 Dataset Arrays 687 GroupedData 687 Descriptive Statistics 688 Statistical Visualization 688 Probability Density Functions 689 Cumulative Distribution Functions 690 Inverse Cumulative Distribution Functions 691 Distribution Statistics Functions 691 Distribution Fitting Functions 692 Negative Log-Likelihood Functions 692 Random Number Generators 693 Hypothesis Tests 694 Analysis of Variance 694 Regression Analysis 694 Multivariate Methods 695 Cluster Analysis 696 Classification 696 Markov Models 696 Design of Experiments 697 Statistical Process Control 697 Graphical User Interfaces 697

A p p e n d i x D Computational Statistics Toolbox Probability Distributions 699 Statistics 699 Random Number Generation 700 Exploratory Data Analysis 700 Bootstrap and Jackknife 701 Probability Density Estimation 701 Supervised Learning 701 Unsupervised Learning 701

Page 11: Computational Stadistic with Matlab

xvi Computational Statistics Handbook with MATLAB®, 2ND Edition

Parametric and Nonparametric Models 702 Markov Chain Monte Carlo 702 Spatial Statistics 702

Appendix E Exploratory Data Analysis Toolboxes E.l Introduction 703 E.2 Exploratory Data Analysis Toolbox 704 E.3 EDA GUI Toolbox 705

Appendix F Data Sets Introduction 719

Appendix G Notation Overview 727 ObservedData 727 Greek Letters 728 Functions and Distributions 728 Matrix Notation 729 Statistics 729

References 731 Author Index 751 Subject Index 757