Random Numbers Toss 1 quarter – What is the probability of getting a head? Toss 1 quarter 10 times...
-
Upload
earl-mckenzie -
Category
Documents
-
view
216 -
download
0
Transcript of Random Numbers Toss 1 quarter – What is the probability of getting a head? Toss 1 quarter 10 times...
Random Numbers
Toss 1 quarter– What is the probability of getting a head?
Toss 1 quarter 10 times– What is the proportion of heads you get?
Toss 1 quarter 40 times– What is the proportion of heads you get?
Excel Exercise 1
The outcome of a coin toss
The time between customers arriving at an ATM
The price of a share of a company’s stock at the close of the market
Independence
Trials (events, outcomes) must be independent of each other.
Probability is empirical Computer simulation is faster, but it’s not
empirical
Probability Models
Create a list of the possible outcomes Assign probabilities to each outcome.
Phone book – Benford’s Law
First digits of phone number suffixes
(e.g., 123-4567)First digits of addresses
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Benford’s Law
In many data, the distribution of “first digits” can be modeled as a (-log) distribution.
This means that the digit “1” has a much higher probability of occurring than if the digits were uniformly distributed.
Probability Models
e.g., An “outcome” can a particular random sample of 55,000 households out of 106,000,000 households.
Suppose there are five students. If we want to draw a random sample with n=3, we have 10 distinct possibilities for samples
And therefore the sample space is ….
10!35!3
!5
3
5
the sample space is ….
Jack Dan Jill Jack Jill Helen Dan Harry Helen
Jack Dan Harry Jack Harry Helen Jill Harry Helen
Jack Dan Helen Dan Jill Harry
Jack Jill Harry Dan Jill Helen
Jack Dan Jill Jack Jill Helen Dan Harry Helen
Jack Dan Harry Jack Harry Helen Jill Harry Helen
Jack Dan Helen Dan Jill Harry
Jack Jill Harry Dan Jill Helen
Each of these is an event
Jack Dan Jill 10% 4% 5% 9% 3%
Jack Dan Harry 10% 12% 12% 5% 9%
Jack Dan Helen 10% 7% 13% 3% 25%
Jack Jill Harry 10% 0% 4% 8% 24%
Jack Jill Helen 10% 7% 10% 2% 20%
Jack Harry Helen 10% 16% 14% 16% 7%
Dan Jill Harry 10% 16% 12% 15% 2%
Dan Jill Helen 10% 10% 10% 11% 6%
Dan Harry Helen 10% 19% 14% 17% 0%
Jill Harry Helen 10% 8% 5% 14% 5%
sum 100% 100% 100% 100% 100%
What if we only care about the sum of the dots, not their order nor how we get to the sum? How many alternatives are there?
What if we only care about the number of dots on the dice, but not their order? How many alternatives are there?
A s an event
Assigning Probabilities: Intervals of Outcomes
02
y
0 .2 .4 .6 .8x
There are 10 possible events.
What is P(0.2<X<0.6)
Assigning Probabilities: Intervals of Outcomes
There are 50 possible events.
What is P(0.2<X<0.6)0
2y
0 .2 .4 .6 .8 1x
Assigning Probabilities: Intervals of Outcomes
There are 5000 possible events.
What is P(0.2<X<0.6)0
2y
0 .2 .4 .6 .8 1x
This Normal distribution is the idealized equivalent of a Normal probability model.
0.1
.2.3
.4D
en
sity
-4 -2 0 2 4z
This Normal probability model (empirical)
is idealized by the Normal distribution (computational).
Random Variables
Icosahedron
X
Y
X+Y
Assigning Probabilities: Intervals of Outcomes
02
y
0 .2 .4 .6 .8x
Both are uniformly distributed between 0 and 1, but one is a discrete random variable and the other one is continuous.
02
y
0 .2 .4 .6 .8 1x
For discrete random variables, the difference between > and ≥ matters.
Take the list of integers between 0 to 50. The probability of randomly picking any individual integer out of that sample space, say, 23, is 1/50, one fiftieth.
n 50 500 50,000 50million 50 trillion
-> Infinity
P(X=23) 1/50 1/500 1/50,000 1/50,000,000
1/50,000,000,000,
000
-> 0
For continuous random variables, the difference between > and ≥ doesn’t matter because the probability of picking any one number is basically zero.
If I buy a lottery ticket for $1 but win nothing, my net gain is -$1.
If I win (and the probability is 0.00001) $100,000, I my net gain is $99,999
Then on average, my net gain from playing the lottery is
Wrong!999,492
999,991
If I buy a lottery ticket for $1 but win nothing, my net gain is -$1.
If I win (and the probability is 0.00001) $100,000, I my net gain is $99,999
Then on average, my net gain from playing the lottery is
This is the expected value of X
45.0$000001.0999,99999999.01
Excel Exercise 2
Excel Exercise 3
=0
Not independent
The Sampling Distribution of a Sample Mean
Suppose there are five students. If we want to draw a random sample with n=3, we have 10 distinct possibilities for samples
10!35!3
!5
3
5
Jack Dan Jill Jack Jill Helen Dan Harry Helen
Jack Dan Harry Jack Harry Helen Jill Harry Helen
Jack Dan Helen Dan Jill Harry
Jack Jill Harry Dan Jill Helen
Suppose there are 1000 individuals. We can collect information from small samples (n=30, n=60, n=100, n=300).
What will be the sampling distribution of the mean (the distribution of the sample mean)?
05
.000
e-0
71.0
00e-
061
.500
e-0
62.0
00e-
062
.500
e-0
6D
ensi
ty
-500000 0 500000 1000000hprice
kernel density function of 1000 normal random numbers
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000800000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063.0
00e
-06
De
nsity
-200000 0 200000400000600000hprice
01.0
00e
-06
2.0
00e
-06
3.0
00e
-06
4.0
00e
-06
5.0
00e
-06
De
nsity
-200000 0 200000400000600000hprice
02
.00
0e-0
64.0
00e
-066
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
02
.00
0e-0
64.0
00e
-066
.00
0e-0
6D
ens
ity
-400000-200000 0 200000400000600000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-400000-200000 0 200000400000600000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01.0
00e
-06
2.0
00e
-06
3.0
00e
-06
4.0
00e
-06
5.0
00e
-06
De
nsity
-200000 0 200000400000600000hprice
kernel density function of normal random numbers in samples n = 30
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063.0
00e
-06
De
nsity
-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000800000hprice
01.0
00e
-06
2.0
00e
-06
3.0
00e
-06
4.0
00e
-06
5.0
00e
-06
De
nsity
-200000 0 200000400000600000800000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01.0
00e
-06
2.0
00e
-06
3.0
00e
-06
4.0
00e
-06
5.0
00e
-06
De
nsity
-400000-200000 0 200000400000600000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000800000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-500000 0 500000 1000000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
kernel density function of normal random numbers in samples n = 60
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62
.00
0e-0
63
.00
0e-0
64
.00
0e-0
6D
ens
ity
-400000-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-500000 0 500000 1000000hprice
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-400000-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-200000 0 200000400000600000hprice
01
.00
0e-0
62.0
00e
-063.0
00e
-06
De
nsity
-200000 0 200000400000600000800000hprice
01
.00
0e-0
62.0
00e
-063
.00
0e-0
6D
ens
ity
-500000 0 500000 1000000hprice
05.0
00e
-07
1.0
00e
-06
1.5
00e
-06
2.0
00e
-06
2.5
00e
-06
De
nsity
-500000 0 500000 1000000hprice
05.0
00e
-07
1.0
00e
-06
1.5
00e
-06
2.0
00e
-06
2.5
00e
-06
De
nsity
-200000 0 200000400000600000hprice
kernel density function of normal random numbers in samples n = 100
05.
000e
-07
1.00
0e-0
61.
500e
-06
2.00
0e-0
62.
500e
-06
Den
sity
-500000 0 500000 1000000hprice
05.
000e
-07
1.00
0e-0
61.
500e
-06
2.00
0e-0
62.
500e
-06
Den
sity
-500000 0 500000 1000000hprice
05.
000e
-07
1.00
0e-0
61.
500e
-06
2.00
0e-0
62.
500e
-06
Den
sity
-500000 0 500000 1000000hprice
kernel density function of normal random numbers in samples n = 300
05
.000
e-0
71.0
00e-
061
.500
e-0
62.0
00e-
062
.500
e-0
6D
ensi
ty
-500000 0 500000 1000000hprice
kernel density function of 1000 normal random numbers
Population Sample 1 Sample 2 Sample 3
mean 201,209 174,898 145,792 221,358
st dev 199,964 203,879 176,502 200,407
Sample 4 Sample 5 Sample 6 Sample 7
mean 216,957 205,055 189,929 245,484
st dev 209,431 247,535 217,423 177,200
Sample 8 Sample 9 Sample 10
mean 207,733 231,096 201,115
st dev 207,355 189,594 170,539
n=60
0.1
.2.3
.4D
ensi
ty
-2 -1 0 1 2x
0.1
.2.3
.4D
ensi
ty
-3 -2 -1 0 1 2x
0.1
.2.3
.4D
ensi
ty
-2 -1 0 1 2x
0.1
.2.3
.4.5
Den
sity
-2 -1 0 1 2x
hist x in 1/60
hist x in 61/120
hist x in 121/180hist x in 181/240
Excel Exercise 4
-1.5 -1 -0.5 0 0.5 1 1.50
5
10
-1.5 -1 -0.5 0 0.5 1 1.50
5
10
-1.5 -1 -0.5 0 0.5 1 1.50
5
10
-1.5 -1 -0.5 0 0.5 1 1.50
5
10
= 0.0068
n=3
n=30
n=60
n=100
n=30 mean standard error
3 0.1144 0.4978
30 -0.0001 0.1752
60 -0.0421 0.1249
100 -0.011 0.0967
population 0.0068 n.a.
0 100 200 300 400 500 600 700 800 900 1000-15
-10
-5
0
5
10
15
20
Notice that the Central Limit Theorem doesn’t say “Draw an SRS of size n from a Normally distributed population.” It says “any” population.
Excel Exercise 5 and 6