Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

21
Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Transcript of Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Page 1: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Descriptive Statistics: Part One

Farrokh Alemi Ph.D.

Kashif Haqqi M.D.

Page 2: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Table of Content

• Objectives• Definitions• Sampling methods• Types of variables

• Reliability and validity• Average• Median• Mode

Page 3: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Objectives

1. Define validity and reliability and explain the role of each in assessing the quality of data.

2. Distinguish among nominal, ordinal, and numeric data, as well as discrete and continuous data.

3. Given a set of numerical data, calculate the mean, median and mode, and state the relative advantages of each as a measure of central tendency.

Back to Table of Content

Page 4: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Definition of Variables

• A variable is an attribute of a person or an object that varies.

• Measurement are rules for assigning numbers to objects to represent quantities of attributes.

Back to Table of Content

Page 5: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

What Is Statistics?

• Statistics is the science of describing or making inferences about the world from a sample of data.

• Descriptive statistics are numerical estimates that organize and sum up or present the data.

• Inferential statistics is the process of inferring from a sample to the population.

Page 6: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Definition

• Datum is one observation about the variable being measured.

• Data are a collection of observations.

• A population consists of all subjects about whom the study is being conducted.

• A sample is a sub-group of population being examined.

Page 7: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Sampling Methods

• Random sample: all subjects have equal chance of inclusion in the study.

• Systematic sampling: selecting the kth numbered subject.

• Stratified sample: random sampling within pre-defined groups of subjects.

• Staged sampling: A small random sample is made and if its results are ambiguous then another larger random sample is collected.

Back to Table of Content

Page 8: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Types of Variables

• A discrete variable has gaps between its values. For example, sex is a discrete variable. If male is 1 and female is 0, values in between have no meaning.

• A continuous variable has no gaps between its values. All values or fractions of values have meaning. Age is an example of continuous variable.

Back to Table of Content

Page 9: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Types of Variables (Continued)

• Nominal scale assign numbers to attribute to name the category. The numbers have no meaning by themselves, e.g. DRG code.

• Ordinal scale assign numbers so that more of an attribute has higher values, e.g. Severity.

• In an interval scale the interval between the numbers has meaning, e.g. Fahrenheit scale

• Ratio scale is an interval scale where zero has true meaning, e.g. Age.

Page 10: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Reliability and Validity

Reliability Validity

Definition Consistency of results on repeat measures

Measuring what is supposed to be measured

Types Inter-rater Face validity

Intra-rater Construct validity

Split half Predictive validity

Test-retestBack to Table of Content

Page 11: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

To Be Valid You Must Have a Reliable Measure.

But You Can Have an Invalid Measure That Is Reliable.

Page 12: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Example of Reliability Calculation

• Next page shows a table from Hayward, RA, McMahon LF, Bernard AM. Evaluating the care of general medicine inpatients: how good is implicit review? Annals of Internal Medicine, volume 118(7), 1993, pp 550-556.

• Two reviewers rated the quality of health care delivered in the same case. The Table shows inter-rater reliability.

•00000605-199304010-00010.

Page 13: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Inter-rater Reliability

Second Reviewer's Rating Definetly not Probably not Unsure Probably Definetly

Defintely not 121 22 6 4 2Probably not 33 13 6 2 2

Unsure 4 12 5 0 0Probably 7 3 1 3 2Defintely 1 1 1 5 7

First Reviewer's Ratings

Table shows ratings of 263 by two different reviewers regarding whether hospitalization could have been avoided

Page 14: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Average

• The mean, arithmetic average, is found by adding values of the data and dividing by the number of values. The mean of 3, and 4 is 3.5.

• The geometric average is found by multiplying the values of the data and taking the power of one divided by the number of values. The geometric average of 3 and 4 is square root of 3 times 4.

• Can you calculate the mean and geometric average for 3, 4, and 5?

Back to Table of Content

Page 15: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Example

• The mean of 3, 4 and 5 is the sum of these numbers divided by 3.

• The geometric average of 3, 4 and 5 is the cube root of 3 times 4 times 5. To calculate the cube root in excel you write a formula like: =(3*4*5)^0.33

• The answer is 3.86. Open Excel and verify that you can do this.

Page 16: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Difference Between Mean and Geometric Average

• A geometric average is used when averaging probabilities.

• A mean is used in most other context.

Page 17: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Median

• The median is the halfway point in a data set.

• To calculate median arrange data in order. Calculate half of the observations by dividing the number of values by 2 and rounding the value to the lower number. Count half the values and use the next value as median.

Back to Table of Content

Page 18: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Example

• The median for age of 7 patients (23, 45, 56, 23, 34, 65, 25) if given by:– Order the list of values: 23, 23, 25, 34, 45, 56,

65.– There are 7 observations. Divide 7 by two and

round to lower number and you get 3.– Skip the first 3 and the median is the next

number. In this example, 34 is the median.– Do this in Excel.

Page 19: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Mode

• The most frequent value observed is the mode.

• Mode is always an observed value in the data set.

• To calculate the mode, count the number of times each value is repeated. The value with most repetition is the mode.

• Do this in Excel.

Back to Table of Content

Page 20: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Example for Mode

• Age data: 23, 45, 56, 23, 34, 65, 25.

• 23 is repeated twice.

• All other values are repeated once.

• The mode is 23.

Page 21: Descriptive Statistics: Part One Farrokh Alemi Ph.D. Kashif Haqqi M.D.

Differences in Measures of Central Tendency

• Mode, median and mean could be three different numbers in asymmetrical distributions of data.

• For any data set there is only one mean and median but there may be many modes.

• Median is less influenced by the extreme values than mean.

• Mean is almost never observed, median is observed in only odd numbered data sets and mode is always observed in the data set.