Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5:...

16
Statistics for Journalism Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c) The probability function (density) and the distribution function. d) Mean and variance of a random variable. 2. Probability models: a) Bernoulli. b) Geometric. c) Binomial. d) Normal. e) Normal approximation to the binomial distribution. Recommended reading: Capítulos 22 y 23 del libro de Peña y Romo (1997)

Transcript of Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5:...

Page 1: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Chapter 5: Probability models

1. Random variables:

a) Idea.

b) Discrete and continuous variables.

c) The probability function (density) and the distribution function.

d) Mean and variance of a random variable.

2. Probability models:

a) Bernoulli.

b) Geometric.

c) Binomial.

d) Normal.

e) Normal approximation to the binomial distribution.

Recommended reading:

Capítulos 22 y 23 del libro de Peña y Romo (1997)

Page 2: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

6.1: Random variables

• A function which places a numerical value on each possibleresult of an experiment is called a random variable.

• We use capital letters, e.g. X, Y, Z, to represent randomvariables and lower case letters, x, y, z, to represent particularvalues of these variables.

Discrete random variables can only take a discrete set of possiblevalues.

Continuous random variables can take an infinite number ofvalues within some continuous range.

Page 3: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

The probability function for a discrete r.v.: is the function which

associates the probability P(X=x) to each possible value x.

The possible values of a discrete r.v. X and their respective

probabilities are often displayed in a probability distribution table:

X x1 x2 ... xn

P(X=xi) p1 p2 ... pn

1 2 3 1np p p pEvery probability function satisfies

The distribution function for a discrete r.v.: Let X be a r.v. The distribution

function of X is the function which gives, for each x, the cumulative

probability up to x, that is, ( ) ( )F x P X x

Page 4: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Mean, variance and standard deviation of a discrete r.v.

The mean or expectation of a discrete r.v., X, which takes values x1, ,x2,

....with probabilities p1, p2,... Is given by the following expression:

The variance is defined by the formula

which can be calculated using

( )i i i ii ix P X x x p

2 2 2

i iix p

2 2( )i iix p

Example: The probability distribution of the r.v. X is given in the following table:

xi 1 2 3 4 5

pi 0.1 0.3 ? 0.2 0.3

What is P(X=3)?

Calculate the mean and variance.

The standard deviation is the root of the variance.

Page 5: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

6.2: Probability models

Discrete models Continuous models

Bernoulli trials and the geometric and

binomial distributions

The normal distribution and related

distributions

Page 6: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Bernoulli trials and the geometric and binomial distributions

A Bernoulli model is an experiment with the following characteristics:

• In each trial, there are only two possible results, success and failure.

• The result obtained in each trial is independent of the previous

results.

• The probability of success is constant, P(success) = p, and does not

change from one trial to the next.

Page 7: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

The geometric distribution

Suppose we have a Bernoulli model. What is the distribution of the

number of failures, F, before the first success?

• P(F=0) = P(0 failures before the 1st success) = p

• P(F=1) = P(failure, success) = (1-p)p

• P(F=2) = P(failure, failure, success) = (1-p)2 p

• P(F=f) = P( f failures before the 1st success) = (1-p)f p for f = 0, 1, 2,

The distribution of F is called the geometric distribution with parameter p.

E[F] = (1-p)/p V[X] = (1-p)/p2

Page 8: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

The binomial distribution

Suppose we have a Bernoulli model. What is the distribution of the

number of successes, X, in n trials?

for r = 0,1,2, …, n.

The distribution of X is called the binomial distribution with parameters n

and p.

E[X] = np V[X] = np(1-p)

P( ) ( ) (1 )r n rnObtener r éxitos P X r p p

r

Page 9: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

EXAMPLE

Calculate the probability that in a family with 4 children, 3 of them are

boys.

EXAMPLE

The probability that a student has to repeat the year is 0,3.

• We pick a student at random. What is the probability that the first

repeater is the 3rd student we pick?

• We choose 20 students at random. What is the chance that there

are exactly 4 repeaters?

EXAMPLE

On average, 4% of the votes in an election are null. Calculate the

expected number of null votes in a town with an electorate of 1000.

Page 10: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

DENSITY FUNCTION OF A CONTINUOUS R.V.:

The density function of a continuous r.v. satisfies the following conditions:

It only takes non-negative values, f(x) ≥ 0.

The area under the curve is equal to 1.

Distribution function. As with discrete variables, the distribution function gives the

cumulative probability up to x, that is:

The c.d.f. satisfies the conditions:

It takes the value 0 for any x below the minimum possible value of the

variable.

It takes the value 1 for all points over the maximum possible value of the

variable.

( ) ( )F x P X x

Page 11: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

The normal or gaussian distribution

Many variables have a bell shaped density.

Examples:

• Weights of a population of the same age and sex.

• Heights of the same population.

• The grades in a course (urban myth).

To say that a continuous variable X, has a normal distribution with mean

and standard deviation , we write:

X ~ N( , 2)

Page 12: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

The standard normal distribution

The normal distribution with mean 0 and standard deviation 1 is called the

standard normal distribution.

There are tables which allow us to calculate the probabilities for this

distribution, N(0,1).

If we have a normal r.v., X with mean and standard deviation we can

convert this to a standard, N(0,1) r.v. using the transformation:

XZ

Page 13: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Let Z ~ N(0,1). Calculate the following probabilities

• P(Z < -1)

• P(Z > 1)

• P(-1,5 < Z < 2)

Calculate the 90%, 95%, 97,5% and 99% percentiles of the standard normal

distribution.

(These values are useful in the next chapter)

Let X ~ N(2,4). Calculate the following probabilities

• P(X < 0)

• P(-1 < X < 1)

Examples

Page 14: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Page 15: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Page 16: Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c)

Statistics for Journalism

Approximation of the binomial distribution using a normal

When n is large enough, the binomial distribution, X~B(n, p), looks like a normal

distribution,

The approximation is usually considered to be good if np > 5 and n(1-p) > 5.

, (1 )N np np p

EXAMPLE

We throw a fair coin in the air 400 times. What is the probability of getting

between 180 and 210 heads?