Copy of 05-StatisticalModels

CS109/Stat121/AC209/E-109 Data Science

Statistical ModelsHanspeter Pfister & Joe Blitzstein

pfister@seas.harvard.edu / blitzstein@stat.harvard.edu

data y(observed)

model parameter (unobserved)

statistical inferenceprobability

This Week HW1 due this Thursday - start last week! Course dropbox is now active at http://isites.harvard.edu/k99240

(Harvard ID required). Please follow the submission instructions carefully, and do a test well in advance of the HW1 deadline.

Friday lab 10-11:30 am in MD G115 Pandas with Rahul, Brandon, and Steffen

Road Map to Probability

CDF FPMF (discrete)

PDF (continuous)story

name, parametersMGF

X P(Xx)=F(x)P(X=x)

distributions random variables events numbers

XxX=x

E(X),Var(X),SD(X)

-6 -4 -2 0 2 4 6

0.00.2

0.40.6

0.81.0

x

F(x)

for more about probability: stat110.net


CDF FPMF (discrete)


name, parametersMGF

X P(Xx)=F(x)P(X=x)


XxX=x

generate

E(X),Var(X),SD(X)

-6 -4 -2 0 2 4 6

0.00.2

0.40.6

0.81.0

x

F(x)


CDF FPMF (discrete)


name, parametersMGF

X P(Xx)=F(x)P(X=x)


XxX=x

generate

E(X),Var(X),SD(X)

P

-6 -4 -2 0 2 4 6

0.00.2

0.40.6

0.81.0

x

F(x)


CDF FPMF (discrete)


name, parametersMGF

X P(Xx)=F(x)P(X=x)fu

nctio

n of

r.v.


XxX=x

generate

E(X),Var(X),SD(X)

X,X2,X3,g(X)

E(X),E(X2),E(X3),E(g(X))

P

-6 -4 -2 0 2 4 6

0.00.2

0.40.6

0.81.0

x

F(x)


CDF FPMF (discrete)


name, parametersMGF


nctio

n of

r.v.


XxX=x

generate

E(X),Var(X),SD(X)

X,X2,X3,g(X)

LOTUS E(X),E(X2),E(X3),E(g(X))

P

-6 -4 -2 0 2 4 6

0.00.2

0.40.6

0.81.0

x

F(x)


CDF FPMF (discrete)


name, parametersMGF


nctio

n of

r.v.


XxX=x

generate

E(X),Var(X),SD(X)

X,X2,X3,g(X)

LOTUS E(X),E(X2),E(X3),E(g(X))

P

find the MGF

-6 -4 -2 0 2 4 6

0.00.2

0.40.6

0.81.0

x

F(x)

What is a statistical model? a family of distributions, indexed by parameters sharpens distinction between data and parameters,

and between estimators and estimands parametric (e.g., Normal, Binomial) vs.

nonparametric (e.g., methods like bootstrap, KDE)

data y(observed)

model parameter (unobserved)

statistical inferenceprobability

Parametric vs. Nonparametric

parametric: finite-dimensional parameter space (e.g., mean and variance for a Normal)

nonparametric: infinite-dimensional parameter space is there anything in between? nonparametric is very general, but no free lunch! remember to plot and explore the data!

What good is a statistical model?

All models are wrong, but some models are useful. George Box

All models are wrong, but some models are useful. George Box (1919-2013)

All models are wrong, but some models are useful. George Box

In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a

City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the

Cartographers Guild struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it.

Jorge Luis Borges, On Exactitude in Science

Borges Google Doodle

Statistical Models: Two Books

Parametric Model Example: Exponential Distribution

Remember the memoryless property!

f(x) = ex, x > 0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

x

pdf

0.0 0.5 1.0 1.5 2.0 2.5 3.00.0

0.2

0.4

0.6

0.8

1.0

x

cdf

Length-Biasing ParadoxWhat is the waiting time for a bus?

timeline

For i.i.d. Exponential arrivals, your average waiting time is the same as the average time between buses!

Length-Biasing ParadoxHow would you measure the average prison sentence?

Exponential Distribution

Remember the memoryless property!

f(x) = ex, x > 0

Exponential is characterized by memoryless property and characterized by having constant hazard function all models are wrong, but some are useful... iterate between exploring, the data model-building,

model-fitting, and model-checking key building block for more realistic models

The Weibull Distribution Exponential has constant hazard function Weibull generalizes this to a hazard that is t to a power much more flexible and realistic than Exponential representation: a Weibull is an Expo to a power

The Evil Cauchy Distribution

http://www.etsy.com/shop/NausicaaDistribution

Pois

HGeom

Bin(Bern)

Conditioning

Conditioning

Limit

Limit

Gamma(Expo, Chi-Square)

NBin(Geom)

Poisson process

Limit

Beta(Unif)

Conjugacy

Bank - post officeConjugacy

Student-t(Cauchy)

NormalLimit

Limit

Limit

Family Tree of Parametric Distributions

Blitzstein-Hwang, Introduction to Probability

Binomial Distribution3.3. BERNOULLI AND BINOMIAL 81

Figure 3.6 shows plots of the Binomial PMF for various values of n and p. Note thatthe PMF of the Bin(10, 1/2) distribution is symmetric about 5, but when the successprobability is not 1/2, the PMF is skewed. For a fixed number of trials n, X tends to belarger when the success probability is high and lower when the success probability is low,as we would expect from the story of the Binomial distribution. Also recall that in anyPMF plot, the sum of the heights of the vertical bars must be 1.

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

x

pmf

0 2 4 6 8 10

Bin(10,1/2)

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

x

pmf

0 2 4 6 8 10

Bin(10,1/8)

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

x

pmf

0 2 4 6 8 10

Bin(100,0.03)

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

x

pmf

0 2 4 6 8 10

Bin(9,4/5)

Figure 3.6: Some Binomial PMFs. In the lower left, we plot the Bin(100, 0.03) PMFbetween 0 and 10 only, as the probability of more than 10 successes is close to 0.

Weve used Story 3.3.3 to find the Bin(n, p) PMF. The story also gives us a straight-forward proof for the fact that if X is Binomial, then nX is also Binomial.

Theorem 3.3.5. Let X Bin(n, p), and q = 1 p (we often use q to denote the failureprobability of a Bernoulli trial). Then nX Bin(n, q).

story: X~Bin(n,p) is the number of successes in n independent Bernoulli(p) trials.

Binomial Distribution

story: X~Bin(n,p) is the number of successes in n independent Bernoulli(p) trials.

Example: # votes for candidate A in election with n voters, where each independently votes for A with probability p

mean is np (by story and linearity of expectation:E(X+Y)=E(X)+E(Y))

variance is np(1-p) (by story and the fact thatVar(X+Y)=Var(X)+Var(Y) if X,Y are uncorrelated)

(Doonesbury)

Normal (Gaussian) Distribution

Wikipedia

symmetry central limit theorem characterizations (e.g., via entropy) 68-95-99.7% rule

Normal Approximation to Binomial

Wikipedia

Bootstrap3.142 2.718 1.414 0.693 1.618

1.414 2.718 0.693 0.693 2.718

1.618 3.142 1.618 1.414 3.142

1.618 0.693 2.718 2.718 1.414

0.693 1.414 3.142 1.618 3.142

2.718 1.618 3.142 2.718 0.693

1.414 0.693 1.618 3.142 3.142

data:

reps

resample with replacement, use empirical distribution to approximate true distribution

Copy of 05-StatisticalModels

Documents

Transcript of Copy of 05-StatisticalModels

Copy of Copy of Copy of Copy of KTPL · TYPESOFREALESTATELICENSEHOLDERS: ABROKERisresponsibleforallbrokerageactivities,includingactsperformedbysalesagentssponsoredbythebroker ...

Copy of Dewey 10 5 05 JC 5 - Harvard University

COURSE CATALOG '05-06 copy

111 THE ORIGINS OF ASTRONOMY FINAL COPY 04-05-15 PUBLISHER COPY (3).pdf

Recetario Editable Ing copy copy - Mezcal Cómplice€¦ · Title: Recetario Editable_Ing copy copy Created Date: 1/23/2019 4:05:09 PM

Copy of Maxim is Ing ARPU Price Elasticity MD Mar 05

Bab 05. Momentum Dan Impuls - Copy

2012 01-05 - copy

Copy of Copy of Copy of Copy of Copy of Copy of 2nd Home ...

Pancake Recipe copy - Taste Of Hastings Hotels · 2019-03-05 · Title: Pancake Recipe copy Created Date: 20190305131943Z

Copy of Copy of Copy of Copy of Copy of Copy of How I ... · Title: Copy of Copy of Copy of Copy of Copy of Copy of How I Spent My Summer Author: kidsfunlearning kidsfunlearning Keywords:

Copy of Copy of Copy of Copy of Copy of Copy of How I ......Copy of Copy of Copy of Copy of Copy of Copy of How I Spent My Summer Author kidsfunlearning kidsfunlearning Keywords DADcTZn-ycA,BACOZUCwLcY

05 copy of أسباني

Author's personal copy Journal of Financial Economicspublic.econ.duke.edu/~ap172/Patton_Timmermann_sorts_JFE... · 2010-10-05 · Author's personal copy Monotonicity in asset returns:

Copy of Alert Data_10!05!12

Copy of Copy of Copy of Copy of Seminar flier template · Title: Copy of Copy of Copy of Copy of Seminar flier template Author: Jamie Desoto Keywords: DAD1s7dEjXo,BADRsuH-Tqs Created

Copy of copy of copy of copy of copy of airbnb

Uncorrected Proof Copy - State University of New York ... CPSW 5.5 1.24 7 Uncorrected Proof Copy oof Copy Uncorrected Proof Copy Job: Taatjes Operator: IBH Chapter: 03_Roe Date: 01/09/05

Copy of Copy of Copy of Copy of Copy of Copy of How I ... · Copy of Copy of Copy of Copy of Copy of Copy of How I Spent My Summer Author: kidsfunlearning kidsfunlearning Keywords:

Copy of Copy of Copy of Copy of Copy of FS Veggie Flyerfarmshare.org/.../uploads/2020/02/...Church-Flyer.pdf · 2/5/2020 · Title: Copy of Copy of Copy of Copy of Copy of FS Veggie