CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central...
-
date post
18-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central...
![Page 1: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/1.jpg)
CTRC Core Curriculum Seminar Series
Descriptive Statistics: Data Types and Measures, Central
Tendency, Variability
Chang-Xing Ma, PhDAssociate Professor
Department of Biostatistics, UB
January 4, 2012
![Page 2: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/2.jpg)
Disclosure Statement
• Chang-Xing Ma, PhD– Nothing to disclose
![Page 3: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/3.jpg)
Goals and Objectives
• Goals: Gain the knowledge of basic statistics and how to describe the data
• Objectives: – Describe the data type– Summarize data – Understand Measure of Central Tendency– Understand Measure of Dispersion
![Page 4: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/4.jpg)
Outline
• Basic concepts of biostatistics• Data type• Summarize data• Measure of Central Tendency• Measure of Dispersion
![Page 5: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/5.jpg)
Some terminology
• Statistics is the study of how to collect, organize, analyze, and interpret numerical information from data
• Biostatistics—the theory and techniques for collecting, describing, analyzing, and interpreting health data.
![Page 6: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/6.jpg)
Some terminology
• Population refer to all measurements or observations of interest
• Sample is simply a part of the population. But the sample MUST represent the population. – A random sample is such a representative sample
• The sample must be large enough• The sample should be selected randomly
![Page 7: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/7.jpg)
Some terminology
• Parameter is some numerical or nominal characteristic of a population– A parameter is constant, e.g. mean of a population– Usually unknown
• Statistic is some numerical or nominal characteristic of a sample.– We use statistic as an estimate of a parameter of the
population– It tends to differ from one sample to another– We also use statistic to test hypothesis
![Page 8: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/8.jpg)
Population: all U.S. persons ~ Normal (µh,σh2),
A random sample: sample size =
Gender Height Weight
mean height:
mean weight
Parameters
A sample
std height:
std weight
statistics
% of male (=1)
(µw,σw2),
Generate
True Parameters
![Page 9: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/9.jpg)
Sources of data
Records Surveys Experiments
Comprehensive Sample
![Page 10: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/10.jpg)
Quantitative
continuous
Types of variables
Quantitative variables Qualitative variables
Quantitative
discrete
Qualitative
nominal
Qualitative ordinal
![Page 11: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/11.jpg)
Data Types
• Numerical (Quantitative)– numerical measurement
• Height• Weight
• Categorical (Qualitative)– with no natural sense of ordering
• Gender• Hair color • Blood type
![Page 12: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/12.jpg)
Numerical Variable
• Continuous– Range of values
• Height in inch
• Discrete– Limited possible values
• # of smoking per day• # of children in a family
• Age -
![Page 13: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/13.jpg)
• Ordinal (Categorical) vs. Discrete (Numerical)• Ordinal
– Cancer Stage I, II, III, IV– Stage II ≠ 2 times Stage I– Categories could also be A, B, C, D
• Discrete– # of children: 0, 1, 2, …– 4 children = 2 times 2 children
Determining Data Types
![Page 14: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/14.jpg)
Descriptive Statistics – reducing a complex mass of data to a manageable set of information
• Descriptive Statistics: the summary and presentation of data to:– simplify the data– enable meaning full interpretation– support decision making
• Numerical descriptive measures (few numbers)
• Graphical presentations
![Page 15: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/15.jpg)
Inferential statistics
From a sample • to estimate population parameters• to test hypothesis • to build the model to reflect the population• …
![Page 16: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/16.jpg)
The student test score (FCAT)
Student ID Race Sex Reading Math PovertyCode:
Race:W – WhiteB – BlackH – HispanicA – Asian
Sex:F – FemaleM – Male
Poverty:0 – not poor1 – poor
Problem 1
1.Among the 6 variables, which ones are qualitative and which ones are quantitative?2.Is Race nominal or ordinal?
![Page 17: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/17.jpg)
Descriptive Statistics
• Categorical variables: – Frequency distribution– Bar chart, pie chart– Contingency tables
• Continuous variables:– Grouped frequency table– Central Tendency– Variability
![Page 18: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/18.jpg)
Simple Frequency DistributionAn ordered arrangement that shows the
frequency of each level of a variable.race Frequency Percent-----------------------------A 7 4.07 B 42 24.42 H 8 4.65 W 115 66.86
sex Frequency Percent----------------------------F 86 50.00 M 86 50.00
![Page 19: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/19.jpg)
Simple Frequency Distribution
• It is useful for categorical variable• For continuous variable,
– it allows you to pick up at a glance some valuable information, such as highest, lowest value.
– ascertain the general shape or form of the distribution
– make an informed guess about central tendency values
![Page 20: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/20.jpg)
Bar Chart
• summarizing a set of categorical data - nominal or ordinal data
• It displays the data using a number of rectangles, each of which represents a particular category. The length of each rectangle is proportional to the number of cases in the category it represents
• can be displayed horizontally or vertically
• they are usually drawn with a gap between the bars
• Bars for multiple (usually two) variables can be drawn together to see the relationship
0
20
40
60
80
100
120
A B H W
Race
BY
Horizontally
![Page 21: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/21.jpg)
Pie Chart
• summarizing a set of categorical data - nominal or ordinal data
• It is a circle which is divided into segments.
• Each segment represents a particular category.
• The area of each segment is proportional to the number of cases in that category.
Female Male
![Page 22: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/22.jpg)
Complex frequency distribution Table
Distribution of 20 lung cancer patients at the chest department of Alexandria hospital and 40 controls in May 2008 according to smoking
Smoking
Lung cancerTotal
Cases Control
No. % No. % No. %
Smoker15 75% 8 20% 23
38.33
Non smoker
5 25% 32 80% 3761.6
7
Total 20 100 40 100 60 100
![Page 23: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/23.jpg)
How about continuous variables?
• How data is distributed?
• Measure of Central Tendency
• Measure of Variability
![Page 24: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/24.jpg)
Grouped Frequency Distribution – for continuous variable
DATA: Frequency Table
Interval Size:
0
5
10
15
20
25
30
35
150 165 180 195 210 225 240 255 270 285
N:µ:σ:
New Data
HISTOGRAM
POLYGON
15
Example Data
![Page 25: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/25.jpg)
Grouped Frequency Distribution
• BUT the problem is that so much information is presented that it is difficult to discern what the data is really like, or to "cognitively digest" the data.
• the simple frequency distribution usually need to condense even more. – It is possible to lose information (precision) about the data to gain
understanding about distributions. • This is the function of grouping data into equal-sized intervals
called class intervals.• The grouped frequency distribution is further presented as
Frequency Polygons, Histograms, Bar Charts, Pie Charts.
![Page 26: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/26.jpg)
Describing Distributions• Bell-Shaped Distribution
– Normal distribution N (µ=0, σ2 =1)
– t-distribution
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
-3 -2 -1 0 1 2 3
![Page 27: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/27.jpg)
Describing Distributions• Skewed Distribution – positively skewed distribution
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
![Page 28: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/28.jpg)
Describing Distributions• Skewed Distribution – negatively skewed distribution
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
![Page 29: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/29.jpg)
Describing Distributions• Other Shapes
Rectangular Bimodal
![Page 30: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/30.jpg)
Describing Distributions• Other Shapes
J-curve
0 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0 5 10 15 20 25 300
0.05
0.1
0.15
0.2
![Page 31: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/31.jpg)
Probability density function - Normal
green curve isstandard normaldistribution
z-transform
![Page 32: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/32.jpg)
Measure of Central TendencyMean, Median, Mode
• The Mean– average value– not robust to outlying value
• Length of hospital stays:6, 4, 5, 9, 10, 7, 1, 4, 3, 4
• Mean=(6+4+5+9+10+7+1+4+3+4)/10=5.3
N
XX
N
ii
1
![Page 33: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/33.jpg)
Measure of Central TendencyMean, Median, Mode
• The Median– is the point that divides a distribution of data into
two equal parts– robust to outlying value
• Length of hospital stays: sort data1 3 4 4 4 5 6 7 9 10
• median=4.5Split Data
![Page 34: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/34.jpg)
Measure of Central TendencyMean, Median, Mode
• The Mode– is the midpoint of the interval that has highest
frequency– robust to outlying value, but sometimes
misleading• Length of hospital stays: sort data
1 3 4 4 4 5 6 7 9 10
• Mode=4, which occurred 3 times.Most frequently
![Page 35: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/35.jpg)
Comparison between mean and median
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
-3 -2 -1 0 1 2 3
Mean Median
![Page 36: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/36.jpg)
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
Comparison between mean and median
MeanMedian
![Page 37: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/37.jpg)
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
Comparison between mean and median
Mean Median
![Page 38: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/38.jpg)
Summary
• Frequency distribution• Histogram, Polygon graph• Bar Chart, Pie Chart• Describing Distributions• Mean, Median, Mode
DATASET: http://128.205.94.145/STA2008/FL_School0022.xls
![Page 39: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/39.jpg)
Problem 2
• In a study, we collected a medical measurements X for 4 patients
• Data of X: 2, 3, 5, 6
• Mean of X? • Median of X?• Mode of ?
![Page 40: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/40.jpg)
Descriptive StatisticsVariability
• The sample range• Interquartile range• The sample standard deviation (SD), variance• Standard error of mean (SEM)
![Page 41: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/41.jpg)
Measures of Dispersion - Range
• Range – the difference between the lowest and highestFor example, Age of Patients (years): 6 13 7 14 10 14 15 9 7 2 7 13 16 9 8 3 3 17 8 5 4 9 9 6lowest 2, highest 17Range=2 -17 years
• When sample size increases, the range tends to increase as well. (not robust)
![Page 42: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/42.jpg)
Measures of Dispersion - Range
• All of curves have the same range
• Mean?• Median?
![Page 43: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/43.jpg)
Measures of DispersionPercentiles, Deciles, Quartiles
• Percentiles: based on dividing a sample or population into 100 equal parts.
• Deciles divide the distribution into 10 parts• Quartiles divide the distribution into 4 equal parts.
– 1st quartile includes the lowest 25% of the values (Q1)– 2st quartile includes the values from 26 percentile through 50
percentile (Q2) - median– 3st quartile includes the values from 51 percentile through 75
percentile (Q3)
![Page 44: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/44.jpg)
Measures of DispersionInterquarile Range
• Interquarile Range – the 25 percentile (1st quartile) to 75 percentile (3rd quartile)
• Age of Patients (years): 2 3 3 4 5 6 6 7 7 7 8 8 9 9 9 9 10 13 13 14 14 15 16 17– 1st quartile 6, 2nd quartile 8.5, 3rd 13– Interquarile Range = 6 -13 years
• Interquarile Range is a robust estimate of data variability
![Page 45: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/45.jpg)
Measures of DispersionInterquarile Range
Robust estimate, less efficient
![Page 46: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/46.jpg)
Deviations from the meanVariance and Standard Deviation
• deviation: observation - mean• “sum” of deviation
)( xxi 0)( xxiBUT
![Page 47: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/47.jpg)
Deviations from the meanVariance and Standard Deviation
• Measure of how different the values in a set of numbers are from each other
• Variance:
• Standard Deviation:
22 )(1
1xx
ns i
2)(1
1xx
ns i
![Page 48: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/48.jpg)
Deviations from the meanVariance and Standard Deviation
• Data set: 2,3,5,6Calculation:
22 )(
1
1xx
ns i
83.133.3)(1
1 2
xxn
s i
0.44/)6532(/ nxx i
Value of X (X- ) (X- )2
2 -2 4 3 -1 1 5 1 1 6 2 4
∑=0 ∑=10
x x
33.3)14/(10)(1
1 22
xxn
s iVariance
Standard Deviation
![Page 49: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/49.jpg)
Three normal distributions: mean=0 s2=1 s2=2 s2=0.5
0.00
0.10
0.20
0.30
0.40
0.50
0.60
-3 -2 -1 0 1 2 3
0,1 0,2 0,0.5Central Tendency
mean=0
LeptokurticHomogenous
Narrow scatter
PlatykurticHeterogeneous
wide scatter
Mesokurtic
![Page 50: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/50.jpg)
Example 2: FEV1 (litres) of 57 male medical students
Table: FEV1 (litres) of 57 male medical students 2.85 3.19 3.50 3.69 3.90 4.14 4.32 4.50 4.80 5.202.85 3.20 3.54 3.70 3.96 4.16 4.44 4.56 4.80 5.302.98 3.30 3.54 3.70 4.05 4.20 4.47 4.68 4.90 5.433.04 3.39 3.57 3.75 4.08 4.20 4.47 4.70 5.00 3.10 3.42 3.60 3.78 4.10 4.30 4.47 4.71 5.10 3.10 3.48 3.60 3.83 4.14 4.30 4.50 4.78 5.10
![Page 51: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/51.jpg)
Example 2: FEV1 (litres) of 57 male medical students
Mean: 4.06 Variance: 0.45
SD: 0.67 Q1: 3.54
Q2 (Median): 4.10 Q3: 4.52
Percentile 5.16 Range: 2.85 to 5.43
2
2.5
3
3.5
4
4.5
5
5.5
6
2.5 3 3.5 4 4.5 5 5.5 60
2
4
6
8
10
12
14
16
18
FEV1 (litre)
Fre
quen
cy
![Page 52: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/52.jpg)
The Meaning of Standard Deviation
• How the data are dispersed around mean• Mean ± 1 SD represent 68.3% of the
population• Mean ± 2 SD represent 96% of the population• Mean ± 3 SD represent 99.7% of the
population
![Page 53: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/53.jpg)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
-3 -2 -1 0 1 2 3
The Meaning of Standard Deviation
±SD % of Pop
1 68.3
1.96 95
2 95.5
2.58 99
3 99.71SD 1SD
34% 34%
2SD 48% 2SD 48%
![Page 54: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/54.jpg)
Standard Error of Mean (SEM)
• How confident can we be that the sample mean represents the population mean µ?
• SEM=SD/– SEM must be much smaller than the SD
• mean ± 1.96*SD cover 95% of the data• mean ± 1.96*SEM cover 95% of the
population mean• SEM and SD are different!
n
![Page 55: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/55.jpg)
Standard Error of Mean (SEM)
• Describing the scatter or spread of data, use SD• Estimate population parameters, use SEM
• Epidemiologic study, SEM• Clinical or laboratory research, SD
![Page 56: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/56.jpg)
Summarizing Data - CalculatorPut DATA below:
Interval Size:
0102030405060708090
N:µ:σ:
Mean: 4.06 Variance: 0.45
SD: 0.67 Q1: 3.54
Q2 (Median): 4.10 Q3: 4.52
Percentile 5.16 Range: 2.85 to 5.43
2
2.5
3
3.5
4
4.5
5
5.5
6
Ylim:
New Data
HISTOGRAM
POLYGON
1
Example Data
RUN
ReDraw
![Page 57: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/57.jpg)
Box-Plot• The box itself contains the middle 50% of the
data. The upper edge (hinge) of the box indicates the 75th percentile of the data set, and the lower hinge indicates the 25th percentile. The range of the middle two quartiles is known as the inter-quartile range.
• The line in the box indicates the median value of the data.
• The + indicate mean value• The ends of the vertical lines or "whiskers"
indicate the minimum and maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range.
• The points outside the ends of the whiskers are outliers or suspected outliers. 0
50
100
150
200
250
300
350
![Page 58: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/58.jpg)
Box Plot – Example 2
• FEV1 of 57 students Serum triglyceride measurements in cord blood from 282 babies
![Page 59: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/59.jpg)
What you can get from a box-plot?
• Graphically display a variable's location and spread at a glance. [Q1, Q2 (median), Q3, interquartile range]
• Provide some indication of the data's symmetry and skewness.
• Unlike many other methods of data display, boxplots show outliers.
• By using a boxplot for each categorical variable side-by-side on the same graph, one quickly can compare data sets.
• One drawback of boxplots is that they tend to emphasize the tails of a distribution, which are the least certain points in the data set. They also hide many of the details of the distribution. Displaying histogram in conjunction with the boxplot helps
![Page 60: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/60.jpg)
Transformations
-2 -1.5 -1 -0.5 0 0.5 10
10
20
30
40
50
60
70
80
log(triglyceride)
frequ
ency
0 0.5 1 1.5 20
20
40
60
80
100
triglyceride
frequ
ency
LOG (triglyceride)triglyceride
![Page 61: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/61.jpg)
Summarizing data
• Univariate – categorical variable– Frequency distributions– Bar Chart, Pie Chart
![Page 62: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/62.jpg)
Summarizing data• Univariate – continuous variable
– Grouped frequency distributions– Polygon or histogram– Mean, Median, Mode, Percentile, Q1, Q2, Q3,
extreme values– Standard deviation, variance, range, interquartile
range– Box-Plot– Normality test statistics
![Page 63: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/63.jpg)
Next lecture ( Lecture 2)
• Bivariate – one is categorical and the other is continuous variable– t-test– ANOVA
![Page 64: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/64.jpg)
Lecture 3 – categorical data analysis
• Bivariate – both are categorical– Contingency tables– Chi-square test
• Response is categorical, predictors could be both types.– Logistical regression
![Page 65: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/65.jpg)
Lecture 4 – Continuous response
• Correlation• Multiple linear regression
![Page 66: CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor.](https://reader038.fdocuments.net/reader038/viewer/2022110322/56649d245503460f949faf48/html5/thumbnails/66.jpg)
• Thanks.
• Question?