MGMT 276: Statistical Inference in Management Spring, 2014 Green sheets.

MGMT 276: Statistical Inference in Management

Spring, 2014

Green sheets

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=LNPmz_gYiE-7SM&tbnid=hVkE-PWPSTY8GM:&ved=0CAUQjRw&url=http://www.zazzle.com/no%2Bphone%2Bzone%2Bkeychains&ei=FpDqUqHfFqvW2wWKjICIDg&bvm=bv.60444564,d.aWc&psig=AFQjCNFT-Da_mgKYesQR_kp84KrHFPQl9g&ust=1391190378563649

My last name starts with a letter somewhere between

A. A – DB. E – LC. M – RD. S – Z

Please click in

Schedule of readings

Before next exam: February 18th Please read chapters 1 - 4 &

Appendix D & E in Lind

Please read Chapters 1, 5, 6 and 13 in PlousChapter 1: Selective PerceptionChapter 5: PlasticityChapter 6: Effects of Question Wording and FramingChapter 13: Anchoring and Adjustment

By the end of lecture today 2/6/14

Use this as your study guide

Correlational methodologyStrength of correlation versus direction

Positive vs Negative correlationStrong, vs Moderate vs Weak correlation

Characteristics of a distribution

Remember to hold onto

homework until we have a

chance to cover it

Homework due - (February 13th)

On class website: please print and complete homework worksheet # 5

Review of Homework Worksheet

.10.08

2235258

100,00010

.22

.35.25

80,000250,000350,000220,000

Notice Gillian asked 1300 people

130+104+325+455+286=1300

130/1300 = .10

.10x100=10

.10 x 1,000,000 = 100,000


.10.08

2235258

100,00010

.22

.35.25

80,000250,000350,000220,000


1020 3040 50

Age

123456789

Dollars

Sp

en

tStrong Negative

Down-.9


=correl(A2:A11,B2:B11)=-0.9226648007

Strong NegativeDown-0.9227


=correl(A2:A11,B2:B11)=-0.9226648007


This shows a strong negative relationship (r = - 0.92)

between the amount spent on snacks and the age of the

moviegoerDescription includes:

Both variablesStrength

(weak,moderate,strong)Direction (positive, negative)Correlation r (actual number)

Scatterplot displays relationships between two continuous variables

Correlation: Measure of how two variables co-occur and also can be used for prediction

Range between -1 and +1

The closer to zero the weaker the relationshipand the worse the prediction

Positive or negative

Correlation - How do numerical values change?

Let’s estimate the correlation coefficient for each of the following

r = +1.0 r = -1.0 r = +.80

r = -.50 r = 0.0

http://neyman.stat.uiuc.edu/~stat100/cuwu/Games.html

http://argyll.epsb.ca/jreed/math9/strand4/scatterPlot.htm

http://neyman.stat.uiuc.edu/~stat100/cuwu/Games.html

r = +0.97

This shows a strong positive relationship (r = 0.97) between the appraised price of the house

and its eventual sales price

Description includes:Both variables

Strength (weak,moderate,strong)

Direction (positive, negative)Estimated value (actual

number)

r = +0.97 r = -0.48

This shows a moderate negative relationship (r = -

0.48) between the amount of pectin in orange juice and its

sweetnessDescription includes:

Both variablesStrength

(weak,moderate,strong)Direction (positive, negative)

Estimated value (actual number)

r = -0.91

This shows a strong negative relationship (r = -0.91) between the distance that a golf ball is

hit and the accuracy of the drive




number)

r = -0.91 r = 0.61

This shows a moderate positive relationship (r = 0.61) between the length of stay in a hospital and the

number of services provided




number)

r = +0.97 r = -0.48

r = -0.91 r = 0.61

Height of Daughters (inches)

Heig

ht

of

Moth

ers

(i

n)

48 52 56 60 64 68 72 76 48 5

2 5

660 6

4 6

8 7

2

This shows the strong positive (r = +0.8) relationship between the

heights of daughters (in inches) with heights of their mothers (in

inches).

Both axes and values are labeled

Both axes have real numbers

listed

Variable name is

listed clearly

Variable name is listed clearly




number)

1. Describe one positive correlationDraw a scatterplot (label axes)

2. Describe one negative correlationDraw a scatterplot (label axes)

3. Describe one zero correlationDraw a scatterplot (label axes)

Break into groups of 2 or 3Each person hand in own worksheet. Be sure to list

your name and names of all others in your groupUse examples that are different from those is lecture

4. Describe one perfect correlation (positive or negative)Draw a scatterplot (label axes)

5. Describe curvilinear relationshipDraw a scatterplot (label axes)


Heig

ht

of

Moth

ers

(i

n)

48 52 56 60 64 68 72 76 48 5

2 5

660 6

4 6

8 7

2

This shows the strong positive (r = +0.8) relationship between the

heights of daughters (in inches) with heights of their mothers (in

inches).


Both axes have real numbers

listed






Variable name is

listed clearly

Variable name is listed clearly




number)


Heig

ht

of

Moth

ers

(i

n)

48 52 56 60 64 68 72 76 48 5

2 5

660 6

4 6

8 7

2

This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches).



Both variables are listed, as are direction

and strength




Break into groups of 2 or 3Each person hand in own worksheet. Be sure to list

your name and names of all others in your groupUse examples that are different from those is lecture




Heig

ht

of

Moth

ers

(i

n)

48 52 56 60 64 68 72 76 48 5

2 5

660 6

4 6

8 7

2

This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches).



Both variables are listed, as are direction

and strength







=correl(A2:A11,B2:B11)=-0.9226648007


Must be complete

and must be stapled

Hand in your

homework

Sample versus census

How is a census different from a sample?

Census measures each person in the specific population

Sample measures a subset of the population and infers about the population – representative sample is goodWhat’s

better?

Use of existing survey data

U.S. Census

Family size, fertility, occupation

The General Social Survey

Surveys sample of US citizens over 1,000 itemsSame questions asked each year

You’ve completed constructing your questionnaire…what’s

the best way to get responders??

Parameter – Measurement or characteristic of the population Usually unknown (only estimated) Usually represented by Greek letters (µ)

Population (census) versus sampleParameter versus statistic

pronounced

“mu”

pronounced

“mew”

Statistic – Numerical value calculated from a sample Usually represented by Roman letters (x)

pronounced “x bar”

Simple random sampling: each person from the population has an equal probability of being included

Sample frame = how you define population

=RANDBETWEEN(1,115)

Let’s take a sample

…a random sample

Question: Average weight of U of A football playerSample frame population of the U of A football team

Or, you can use excel to provide number for

random sample

Random number table – List of random numbers

64 Pick 64th name on the list

(64 is just an example here)

Pick 24th

name on the

list

Systematic random sampling: A probability sampling technique that involves selecting every

kth person from a sampling frame

You pick the

numberOther examples of systematic random sampling1) check every 2000th light bulb2) survey every 10th voter

Stratified sampling: sampling technique that involves dividing a sample into subgroups (or strata) and then selecting samples from each of these groups

- sampling technique can maintain ratios for the different groups

Average number of speeding tickets

17.7% of sample are Pre-business majors 4.6% of sample are Psychology majors 2.8% of sample are Biology majors 2.4% of sample are Architecture majors etc

Average cost for text books for a semester

12% of sample is from California 7% of sample is from Texas6% of sample is from Florida 6% from New York 4% from Illinois 4% from Ohio 4% from Pennsylvania 3% from Michigan etc

Cluster sampling: sampling technique divides a population sample into subgroups (or clusters) by region or physical space.Can either measure everyone or select samples for each cluster

Textbook prices Southwest schools Midwest schools Northwest schools etc

Average student income, survey by Old main areaNear McClelland Around Main Gate etc

Patient satisfaction for hospital 7th floor (near maternity ward) 5th floor (near physical rehab) 2nd floor (near trauma center) etc

Snowball sampling: a non-random technique in which one or more members of a population are located and used to lead the researcherto other members of the population

Used when we don’t have any other way of finding them - also vulnerable to biases

Convenience sampling: sampling technique that involves sampling people nearby.

A non-random sample and vulnerable to bias

Judgment sampling: sampling technique that involves sampling people who an expert says would be useful.

A non-random sample and vulnerable to bias

Non-random sampling is vulnerable to bias

Overview Frequency distributions

The normal curve

Mean, Median,Mode, Trimmed Mean

Standard deviation,Variance, Range

Mean Absolute Deviation

Skewed right, skewed leftunimodal, bimodal, symmetric

Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure

of 1) central tendency

2) dispersion or 3) shape

Another example: How many kids in your family?

3

4

82

2

1

4

1

14

2

Number of kids in family1 43 21 84 2 2 14

Measures of Central Tendency(Measures of location)

The mean, median and mode

Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations

Mean for a sample:

Mean for a population:

ΣX / N = mean = µ (mu)

Note: Σ = add upx or X = scoresn or N = number of scores

Σx / n = mean = x

Measures of “location”Where on the number line the scores tend to

cluster

Measures of Central Tendency(Measures of location)

The mean, median and mode

Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations

Mean for a sample:

Note: Σ = add upx or X = scoresn or N = number of scores

Σx / n = mean = x


41/ 10 = mean = 4.1

How many kids are in your family?What is the most common family size?

Median: The middle value when observations are ordered from least to most (or most to least)

1, 3, 1, 4, 2, 4, 2, 8, 2, 14

1, 1, 2, 2, 2, 3, 4, 4, 8, 14



148,4,4,2,2,1,




1, 3, 1, 4, 2, 4, 2, 8, 2, 14

2.5

2, 3,1, 2, 4,2, 4, 8,1, 142, 3,1,

Median always has a percentile rank of 50% regardless of shape

of distribution

2 + 3 µ=2.5If there appears to be two

medians, take the mean of the two

Mode: The value of the most frequent observation


Score f .1 22 33 14 25 06 07 08 19 010 011 012 013 014 1

Please note:The mode is “2” because it is the most frequently occurring score.

It occurs “3” times. “3” is not the mode, it is

just the frequency for the value that is the

mode

Bimodal distribution: If there are two mostfrequent observations

What about central tendency for qualitative data?

Mode is good for nominal or ordinal data

Median can be used with ordinal data

Mean can be used with interval or ratio data


The normal curve


Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure

of 1) central tendency

2) dispersion or 3) shape


A little more about frequency distributions

An example of a normal distribution

Measure of central tendency: describes how scores tend tocluster toward the center of the distribution

Normal distribution

In a normal distribution:

mode = mean = median

In all distributions:mode = tallest point

median = middle scoremean = balance point


Positively skewed distribution

In a positively skewed distribution:

mode < median < mean



Note: mean is most affected by outliers or skewed distributions


Negatively skewed distribution

In a negatively skewed distribution: mean < median < mode



Note: mean is most affected by outliers or skewed distributions

Mode: The value of the most frequent observation

Bimodal distribution: Distribution with two mostfrequent observations (2 peaks)

Example: Ian coaches two boys baseball teams. One

team is made up of 10-year-olds and the other is made up of 16-year-olds. When he measured the

height of all of his players he found a bimodal

distribution


The normal curve


Standard deviation,Variance, Range

Mean Absolute Deviation


MGMT 276: Statistical Inference in Management Spring, 2014 Green sheets.

Documents

Transcript of MGMT 276: Statistical Inference in Management Spring, 2014 Green sheets.