STAT 551 PROBABILITY AND STATISTICS I

37
1 STAT 551 PROBABILITY AND STATISTICS I INTRODUCTION

description

STAT 551 PROBABILITY AND STATISTICS I. INTRODUCTION. WHAT IS STATISTICS?. Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics is a way to get information from data. It is the science of uncertainty. - PowerPoint PPT Presentation

Transcript of STAT 551 PROBABILITY AND STATISTICS I

Page 1: STAT 551 PROBABILITY AND STATISTICS I

1

STAT 551PROBABILITY AND

STATISTICS I

INTRODUCTION

Page 2: STAT 551 PROBABILITY AND STATISTICS I

2

WHAT IS STATISTICS?

• Statistics is a science of collecting data,

organizing and describing it and drawing

conclusions from it. That is, statistics is

a way to get information from data. It is

the science of uncertainty.

Page 3: STAT 551 PROBABILITY AND STATISTICS I

WHAT IS STATISTICS?

• A pharmaceutical CEO wants to know if a new drug is superior to already existing drugs, or possible side effects.

• How fuel efficient a certain car model is?

• Is there any relationship between your GPA and employment opportunities?

• Actuaries want to determine “risky” customers for insurance companies.

3

Page 4: STAT 551 PROBABILITY AND STATISTICS I

4

STEPS OF STATISTICAL PRACTICE

• Preparation: Set clearly defined goals, questions of interests for the investigation

• Data collection: Make a plan of which data to collect and how to collect it

• Data analysis: Apply appropriate statistical methods to extract information from the data

• Data interpretation: Interpret the information and draw conclusions

Page 5: STAT 551 PROBABILITY AND STATISTICS I

5

STATISTICAL METHODS• Descriptive statistics include the collection,

presentation and description of numerical data.

• Inferential statistics include making inference,

decisions by the appropriate statistical methods by

using the collected data.

• Model building includes developing prediction

equations to understand a complex system.

Page 6: STAT 551 PROBABILITY AND STATISTICS I

6

BASIC DEFINITIONS

• POPULATION: The collection of all items of interest in a particular study.

•VARIABLE: A characteristic of interest about each

element of a population or sample.

•STATISTIC: A descriptive measure of a sample

•SAMPLE: A set of data drawn from the population;

a subset of the population available for observation

•PARAMETER: A descriptive measure of the

population, e.g., mean

Page 7: STAT 551 PROBABILITY AND STATISTICS I

7

EXAMPLEPopulation Unit Sample VariableAll students currently Student Any department GPAenrolled in school Hours of works per week

All books in library Book Statistics’ Books Replacement cost Frequency of check

out Repair needs

All campus fast food Restaurant Burger King Number of employeesrestaurants Seating capacity

Hiring/Not hiringNote that some samples are not representative of population and shouldn’t be used to draw conclusions about population. In the first example, some students from all (or almost all) departments would constitute a better sample.

Page 8: STAT 551 PROBABILITY AND STATISTICS I

How not to run a presidential pollFor the 1936 election, the Literary Digest picked

names at random out of telephone books in some cities and sent these people some ballots, attempting to predict the election results, Roosevelt versus Landon, by the returns. Now, even if 100% returned the ballots, even if all told how they really felt, even if all would vote, even if none would change their minds by election day, still this method could be (and was) in trouble: They estimated a conditional probability, used part of the American population which had phones, that part was not typical of the total population. [Dudewicz & Mishra, 1988]

Page 9: STAT 551 PROBABILITY AND STATISTICS I

STATISTIC

• Statistic (or estimator) is any function of a r.v. of r.s. which do not contain any unknown quantity. E.g.o are statistics.

o are NOT.

• Any observed or particular value of an estimator is an estimate.

9

)X(xam),X(nim,n/X,X,X ii

ii

n

1ii

n

1i

n

1iii

n

1ii

n

1ii /X,X

Page 10: STAT 551 PROBABILITY AND STATISTICS I

RANDOM VARIABLES• Variables whose observed value is determined by chance

• A r.v. is a function defined on the sample space S that associates a real number with each outcome in S.

• Rvs are denoted by uppercase letters, and their observed values by lowercase letters.

• Example: Consider the random variable X, the number of brown-eyed children born to a couple heterozygous for eye color (each with genes for both brown and blue eyes). If the couple is assumed to have 2 children, X can assume any of the values 0,1, or 2. The variable is random in that brown eyes depend on the chance inheritance of a dominant gene at conception. If for a particular couple there are two brown-eyed children, we have x=2.

10

Page 11: STAT 551 PROBABILITY AND STATISTICS I

11

COLLECTING DATA

• Target Population: The population about which we want to draw inferences.

• Sampled Population: The actual population from which the sample has been taken.

Page 12: STAT 551 PROBABILITY AND STATISTICS I

12

SAMPLING PLAN

• Simple Random Sample (SRS): All possible members are equally likely to be selected.

• Stratified Sampling: Population is separated into mutually exclusive sets (strata) and then sample is drawn by using simple random samples from each strata.

• Convenience Sample: It is obtained by selecting individuals or objects without systematic randomization.

Page 13: STAT 551 PROBABILITY AND STATISTICS I

13

Page 14: STAT 551 PROBABILITY AND STATISTICS I

14

EXAMPLE• A politician who is running for the office of mayor of a city

with 25,000 registered voters runs a survey. In the survey, 48% of the 200 registered voters interviewed say they plan to vote for her.

• What is the population of interest?

• What is the sample?

• Is the value 48% a parameter or a statistic?

The political choices of the 25,000 registered voters

The political choices of the 200 voters interviewed

Statistic

Page 15: STAT 551 PROBABILITY AND STATISTICS I

15

EXAMPLE• A manufacturer of computer chips claims that less than

10% of his products are defective. When 1000 chips were drawn from a large production run, 7.5% were found to be defective.

• What is the population of interest?

• What is the sample?• What is parameter?• What is statistic?• Does the value 10% refer to a parameter or a statistics?

• Explain briefly how the statistic can be used to make inferences about the parameter to test the claim.

The complete production run for the computer chips

1000 chips Proportion of the all chips that are defective

Proportion of sample chips that are defective

Parameter

Because the sample proportion is less than 10%, we can conclude that the claim may be true.

Page 16: STAT 551 PROBABILITY AND STATISTICS I

16

DESCRIPTIVE STATISTICS

• Descriptive statistics involves the arrangement, summary, and presentation of data, to enable meaningful interpretation, and to support decision making.

• Descriptive statistics methods make use of– graphical techniques– numerical descriptive measures.

• The methods presented apply both to – the entire population– the sample

Page 17: STAT 551 PROBABILITY AND STATISTICS I

17

Types of data and information• A variable - a characteristic of population or sample

that is of interest for us.– Cereal choice – Expenditure– The waiting time for medical services

• Data - the observed values of variables – Interval and ratio data are numerical observations (in ratio

data, the ratio of two observations is meaningful and the value of 0 has a clear “no” interpretation. E.g. of ratio data: weight; e.g. of interval data: temp.)

– Nominal data are categorical observations– Ordinal data are ordered categorical observations

Page 18: STAT 551 PROBABILITY AND STATISTICS I

Types of data – examplesExamples of types of data

Quantitative

Continuous Discrete

Blood pressure, height, weight, age

Number of childrenNumber of attacks of asthma per week

Categorical (Qualitative)

Ordinal (Ordered categories) Nominal (Unordered categories)

Grade of breast cancerBetter, same, worseDisagree, neutral, agree

Sex (Male/female)Alive or deadBlood group O, A, B, AB

18

Page 19: STAT 551 PROBABILITY AND STATISTICS I

19

Types of data – analysis

Knowing the type of data is necessary to properly select the technique to be used when analyzing data.

Types of descriptive analysis allowed for each type of data

Numerical data – arithmetic calculations Nominal data – counting the number of observation in each

category Ordinal data - computations based on an ordering process

Page 20: STAT 551 PROBABILITY AND STATISTICS I

20

Types of data - examples

Numerical data

Age - income55 7500042 68000

. .

. .

Age - income55 7500042 68000

. .

. .Weight gain+10+5..

Weight gain+10+5..

Nominal

Person Marital status1 married2 single3 single. .. .

Person Marital status1 married2 single3 single. .. .Computer Brand

1 IBM2 Dell3 IBM. .. .

Computer Brand1 IBM2 Dell3 IBM. .. .

Page 21: STAT 551 PROBABILITY AND STATISTICS I

21

Types of data - examples

Numerical data

Age - income55 7500042 68000

. .

. .

Age - income55 7500042 68000

. .

. .

Nominal data

A descriptive statistic for nominal data is the proportion of data that falls into each category.

IBM Dell Compaq Other Total 25 11 8 6 50 50% 22% 16% 12%

IBM Dell Compaq Other Total 25 11 8 6 50 50% 22% 16% 12%

Weight gain+10+5..

Weight gain+10+5..

Page 22: STAT 551 PROBABILITY AND STATISTICS I

22

Cross-Sectional/Time-Series/Panel Data

• Cross sectional data is collected at a certain point in time – Test score in a statistics course– Starting salaries of an MBA program graduates

• Time series data is collected over successive points in time – Weekly closing price of gold– Amount of crude oil imported monthly

• Panel data is collected over successive points in time as well

Page 23: STAT 551 PROBABILITY AND STATISTICS I

DifferencesCross-sectional Time series Panel

Change in time Cannot measure Can measure Can measure

Properties of the series

No series Long; usually just one or a few series

Short; hundreds of series

Measurement time

Measurement only at one time point; even if more than one time point, samples are independent from each other

Usually at regular time points (all series are taken at the same time points and time points are equally spaced)

Varies

Measurements Response(s); time-independent covariates

Response(s); time; usually no covariate

Response(s); time; time-dependent and independent covariates

23

Page 24: STAT 551 PROBABILITY AND STATISTICS I

GAMES OF CHANCE

24

Page 25: STAT 551 PROBABILITY AND STATISTICS I

25

COUNTING TECHNIQUES• Methods to determine how many subsets

can be obtained from a set of objects are called counting techniques.

FUNDAMENTAL THEOREM OF COUNTING

If a job consists of k separate tasks, the i-th of which can be done in ni ways, i=1,2,…,k, then the entire job can be done in n1xn2x…xnk ways.

Page 26: STAT 551 PROBABILITY AND STATISTICS I

26

THE FACTORIAL• number of ways in which objects can be

permuted. n! = n(n-1)(n-2)…2.1

0! = 1, 1! = 1

Example: Possible permutations of {1,2,3} are {1,2,3}, {1,3,2}, {3,1,2}, {2,1,3}, {2,3,1}, {3,2,1}. So, there are 3!=6 different permutations.

Page 27: STAT 551 PROBABILITY AND STATISTICS I

27

COUNTING

• Partition Rule: There exists a single set of N distinctly different elements which is partitioned into k sets; the first set containing n1 elements, …, the k-th set containing nk elements. The number of different partitions is

1 21 2

! where .

! ! ! kk

NN n n n

n n n

Page 28: STAT 551 PROBABILITY AND STATISTICS I

COUNTING

• Example: Let’s partition {1,2,3} into two sets; first with 1 element, second with 2 elements.

• Solution:

Partition 1: {1} {2,3}

Partition 2: {2} {1,3}

Partition 3: {3} {1,2}

3!/(1! 2!)=3 different partitions28

Page 29: STAT 551 PROBABILITY AND STATISTICS I

Example

• How many different arrangements can be made of the letters “ISI”?

1st letter 2nd letter 3rd letter

29

I I

I

S

S

S I I

N=3, n1=2, n2=1; 3!/(2!1!)=3

Page 30: STAT 551 PROBABILITY AND STATISTICS I

Example

• How many different arrangements can be made of the letters “statistics”?

• N=10, n1=3 s, n2=3 t, n3=1 a, n4=2 i, n5=1 c

30

50400!1!2!1!3!3

!10

Page 31: STAT 551 PROBABILITY AND STATISTICS I

31

COUNTING1. Ordered, without replacement

2. Ordered, with replacement

3. Unordered, without replacement

4. Unordered, with replacement

(e.g. picking the first 3 winners of a competition)

(e.g. tossing a coin and observing a Head in the k th toss)

(e.g. 6/49 lottery)

(e.g. picking up red balls from an urn that has both red and green balls & putting them back)

Page 32: STAT 551 PROBABILITY AND STATISTICS I

32

PERMUTATIONS

• Any ordered sequence of r objects taken from a set of n distinct objects is called a permutation of size r of the objects.

,

!( 1)...( 1)

( )!r n

nP n n n r

n r

Page 33: STAT 551 PROBABILITY AND STATISTICS I

33

COMBINATION

• Given a set of n distinct objects, any unordered subset of size r of the objects is called a combination.

,

!!( )!r n

n nC

r n rr

Properties

1, 1, 0

n n n n

n r n r

Page 34: STAT 551 PROBABILITY AND STATISTICS I

34

COUNTING

Number of possible arrangements of size r from n

objects

Without Replacement

With Replacement

Ordered

Unordered

!

!nn r

rn

n

r

1n r

r

Page 35: STAT 551 PROBABILITY AND STATISTICS I

EXAMPLE

• How many different ways can we arrange 3 books (A, B and C) in a shelf?

• Order is important; without replacement

• n=3, r=3; n!/(n-r)!=3!/0!=6, or

35

Possible number of books for 1st place in the shelf

Possible number of books for 2nd place in the shelf

Possible number of books for 3rd place in the shelf

3 x 2 x 1

Page 36: STAT 551 PROBABILITY AND STATISTICS I

EXAMPLE, cont.• How many different ways can we arrange

3 books (A, B and C) in a shelf?

1st book 2nd book 3rd book

36

A B

C

A

C

B

B C

C A

C A

AB

B

Page 37: STAT 551 PROBABILITY AND STATISTICS I

37

EXAMPLE• Lotto games: Suppose that you pick 6 numbers

out of 49• What is the number of possible choices

– If the order does not matter and no repetition is allowed?

– If the order matters and no repetition is allowed?

million14816,983,136

49

101044x45x46x47x48x49!43

!49