Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5:...

Post on 16-Feb-2020

10 views 0 download

Transcript of Chapter 5: Probability modelshalweb.uc3m.es/esp/personal/personas/mwiper/... · Chapter 5:...

Statistics for Journalism

Chapter 5: Probability models

1. Random variables:

a) Idea.

b) Discrete and continuous variables.

c) The probability function (density) and the distribution function.

d) Mean and variance of a random variable.

2. Probability models:

a) Bernoulli.

b) Geometric.

c) Binomial.

d) Normal.

e) Normal approximation to the binomial distribution.

Recommended reading:

Capítulos 22 y 23 del libro de Peña y Romo (1997)

Statistics for Journalism

6.1: Random variables

• A function which places a numerical value on each possibleresult of an experiment is called a random variable.

• We use capital letters, e.g. X, Y, Z, to represent randomvariables and lower case letters, x, y, z, to represent particularvalues of these variables.

Discrete random variables can only take a discrete set of possiblevalues.

Continuous random variables can take an infinite number ofvalues within some continuous range.

Statistics for Journalism

The probability function for a discrete r.v.: is the function which

associates the probability P(X=x) to each possible value x.

The possible values of a discrete r.v. X and their respective

probabilities are often displayed in a probability distribution table:

X x1 x2 ... xn

P(X=xi) p1 p2 ... pn

1 2 3 1np p p pEvery probability function satisfies

The distribution function for a discrete r.v.: Let X be a r.v. The distribution

function of X is the function which gives, for each x, the cumulative

probability up to x, that is, ( ) ( )F x P X x

Statistics for Journalism

Mean, variance and standard deviation of a discrete r.v.

The mean or expectation of a discrete r.v., X, which takes values x1, ,x2,

....with probabilities p1, p2,... Is given by the following expression:

The variance is defined by the formula

which can be calculated using

( )i i i ii ix P X x x p

2 2 2

i iix p

2 2( )i iix p

Example: The probability distribution of the r.v. X is given in the following table:

xi 1 2 3 4 5

pi 0.1 0.3 ? 0.2 0.3

What is P(X=3)?

Calculate the mean and variance.

The standard deviation is the root of the variance.

Statistics for Journalism

6.2: Probability models

Discrete models Continuous models

Bernoulli trials and the geometric and

binomial distributions

The normal distribution and related

distributions

Statistics for Journalism

Bernoulli trials and the geometric and binomial distributions

A Bernoulli model is an experiment with the following characteristics:

• In each trial, there are only two possible results, success and failure.

• The result obtained in each trial is independent of the previous

results.

• The probability of success is constant, P(success) = p, and does not

change from one trial to the next.

Statistics for Journalism

The geometric distribution

Suppose we have a Bernoulli model. What is the distribution of the

number of failures, F, before the first success?

• P(F=0) = P(0 failures before the 1st success) = p

• P(F=1) = P(failure, success) = (1-p)p

• P(F=2) = P(failure, failure, success) = (1-p)2 p

• P(F=f) = P( f failures before the 1st success) = (1-p)f p for f = 0, 1, 2,

The distribution of F is called the geometric distribution with parameter p.

E[F] = (1-p)/p V[X] = (1-p)/p2

Statistics for Journalism

The binomial distribution

Suppose we have a Bernoulli model. What is the distribution of the

number of successes, X, in n trials?

for r = 0,1,2, …, n.

The distribution of X is called the binomial distribution with parameters n

and p.

E[X] = np V[X] = np(1-p)

P( ) ( ) (1 )r n rnObtener r éxitos P X r p p

r

Statistics for Journalism

EXAMPLE

Calculate the probability that in a family with 4 children, 3 of them are

boys.

EXAMPLE

The probability that a student has to repeat the year is 0,3.

• We pick a student at random. What is the probability that the first

repeater is the 3rd student we pick?

• We choose 20 students at random. What is the chance that there

are exactly 4 repeaters?

EXAMPLE

On average, 4% of the votes in an election are null. Calculate the

expected number of null votes in a town with an electorate of 1000.

Statistics for Journalism

DENSITY FUNCTION OF A CONTINUOUS R.V.:

The density function of a continuous r.v. satisfies the following conditions:

It only takes non-negative values, f(x) ≥ 0.

The area under the curve is equal to 1.

Distribution function. As with discrete variables, the distribution function gives the

cumulative probability up to x, that is:

The c.d.f. satisfies the conditions:

It takes the value 0 for any x below the minimum possible value of the

variable.

It takes the value 1 for all points over the maximum possible value of the

variable.

( ) ( )F x P X x

Statistics for Journalism

The normal or gaussian distribution

Many variables have a bell shaped density.

Examples:

• Weights of a population of the same age and sex.

• Heights of the same population.

• The grades in a course (urban myth).

To say that a continuous variable X, has a normal distribution with mean

and standard deviation , we write:

X ~ N( , 2)

Statistics for Journalism

The standard normal distribution

The normal distribution with mean 0 and standard deviation 1 is called the

standard normal distribution.

There are tables which allow us to calculate the probabilities for this

distribution, N(0,1).

If we have a normal r.v., X with mean and standard deviation we can

convert this to a standard, N(0,1) r.v. using the transformation:

XZ

Statistics for Journalism

Let Z ~ N(0,1). Calculate the following probabilities

• P(Z < -1)

• P(Z > 1)

• P(-1,5 < Z < 2)

Calculate the 90%, 95%, 97,5% and 99% percentiles of the standard normal

distribution.

(These values are useful in the next chapter)

Let X ~ N(2,4). Calculate the following probabilities

• P(X < 0)

• P(-1 < X < 1)

Examples

Statistics for Journalism

Statistics for Journalism

Statistics for Journalism

Approximation of the binomial distribution using a normal

When n is large enough, the binomial distribution, X~B(n, p), looks like a normal

distribution,

The approximation is usually considered to be good if np > 5 and n(1-p) > 5.

, (1 )N np np p

EXAMPLE

We throw a fair coin in the air 400 times. What is the probability of getting

between 180 and 210 heads?