Data Display and Summary

49
Data Display and Summary Biostatistics By Dr Zahid Khan

description

Data Display and Summary

Transcript of Data Display and Summary

Page 1: Data Display and Summary

Data Display and Summary

Biostatistics

By Dr Zahid Khan

Page 2: Data Display and Summary

2

Learning Objectives

• Acquiring the basic knowledge of biostatistics necessary for them to understand and comprehend medical literature and evidence-based medicine, follow up with the expanding medical knowledge and participate in research.

• Identify the role of biostatistics in medical research Define, appraise, use and interpret the different tools used for data analysis

• Define, enumerate and identify the different methods of data summarization in the form of tables, graphs and numeric measures of central tendency and dispersion and the ability to report dichotomy variables.

• Define, appraise, use and interpret the different tools used for data analysis

Page 3: Data Display and Summary

3

Data

• Data is a collection of facts, such as values or measurements.

OR

• Data is information that has been translated into a form that is more convenient to move or process.

OR

• Data are any facts, numbers, or text that can be processed by a computer.

Page 4: Data Display and Summary

4

Statistics

Statistics is the study of the collection, summarizing, organization, analysis, and interpretation of data.

Page 5: Data Display and Summary

5

Vital statistics Vital statistics is collecting, summarizing,

organizing, analysis, presentation, and interpretation of data related to vital events of life as births, deaths,

marriages, divorces,

health & diseases.

Page 6: Data Display and Summary

6

Biostatistics Biostatistics is the application of statistical

techniques to scientific research in health-related fields, including medicine, biology, and public health.

Page 7: Data Display and Summary

7

Descriptive Statistics

The term descriptive statistics refers to statistics that are used to describe. When using descriptive statistics, every member of a group or population is measured. A good example of descriptive statistics is the Census, in which all members of a population are counted.

Page 8: Data Display and Summary

8

Inferential or Analytical Statistics

Inferential statistics are used to draw conclusions and make predictions based on the analysis of numeric data.

Page 9: Data Display and Summary

9

Primary & Secondary Data

• Raw or Primary data: when data collected having lot of unnecessary, irrelevant & un wanted information

• Treated or Secondary data: when we treat & remove this unnecessary, irrelevant & un wanted information

• Cooked data: when data collected not genuinely and is false and fictitious

Page 10: Data Display and Summary

10

Ungrouped & Grouped Data

• Ungrouped data: when data presented or observed individually. For example if we

• observed no. of children in 6 families

2, 4, 6, 4, 6, 4

• Grouped data: when we grouped the identical data by frequency. For example above

• data of children in 6 families can be grouped as:

No. of children Families

2 1

4 3

6 2

or alternatively we can make classes:

No. of children Frequency

2 - 4 4

5 - 7 2

Page 11: Data Display and Summary

11

Variable 

A variable is something that can be changed, such as a characteristic or value. For example age, height, weight, blood pressure etc

Page 12: Data Display and Summary

12

Types of Variable 

Independent variable: is typically the variable representing the value being manipulated or changed. For example smoking

Dependent variable: is the observed result of the independent variable being manipulated. For example ca of lung

Confounding variable: is associated with both exposure and disease. For example age is factor for many events

Page 13: Data Display and Summary

13

Categories of DATA

Page 14: Data Display and Summary

14

Quantitative or Numerical data

This data is used to describe a type of information that can be counted or expressed numerically (numbers)

2, 4 , 6, 8.5, 10.5

Page 15: Data Display and Summary

15

Quantitative or Numerical data (cont.)

This data is of two types

1. Discrete Data: it is in whole numbers or values and has no fraction. For example

Number of children in a family = 4

Number of patients in hospital = 320

2. Continuous Data (Infinite Number): measured on a continuous scale. It can be in fraction. For example

Height of a person = 5 feet 6 inches 5”.6’

Temperature = 92.3 °F

Page 16: Data Display and Summary

16

Qualitative or Categorical dataThis is non numerical data as

Male/Female, Short/Tall

This is of two types

1. Nominal Data: it has series of unordered categories

( one can not √ more than one at a time) For example

Sex = Male/Female Blood group = O/A/B/AB

2. Ordinal or Ranked Data: that has distinct ordered/ranked categories. For example

Measurement of height can be = Short / Medium / Tall

Degree of pain can be = None / Mild /Moderate / Severe

Page 17: Data Display and Summary

17

Measures of Central Tendency & Variation (Dispersion)

Page 18: Data Display and Summary

18

Measures of Central Tendency

are quantitative indices that describe the center of a distribution of data. These are

• Mean

• Median (Three M M M)

• Mode

Page 19: Data Display and Summary

19

Mean Mean or arithmetic mean is also called AVERAGE and

only calculated for numerical data. For example

• What average age of children in years?

Children 1 2 3 4 5 6 7

Age 6 4 4 3 2 4 6

-- Formula X = ∑ X ___

n

Mean = 6 + 4 + 4 + 3 + 2 + 4 + 5 = 28 = 4 years

7 7

Page 20: Data Display and Summary

20

Median

• It is central most value. For example what is central value in 2, 3, 4, 4, 4, 5, 6 data?

• If we divide data in two equal groups 2, 3, 4, 4, 4, 5, 6 hence 4 is the central most value

• Formula to calculate central value is:

Median = n + 1 (here n is the total no. of value)

2

Median = (n + 1)/2 = 7 + 1 = 8/2 = 4

Page 21: Data Display and Summary

21

Mode

• is the most frequently (repeated) occurring value in set of observations. Example

• No mode

Raw data: 10.3 4.9 8.9 11.7 6.3 7.7

• One mode

Raw data: 2 3 4 4 4 5 6

• More than 1 mode

Raw data: 21 28 28 41 43 43

Page 22: Data Display and Summary

Comparison of the Mode, the Median, and the Mean

• In a normal distribution, the mode , the median, and the mean have the same value.

• The mean is the widely reported index of central tendency for variables measured on an interval and ratio scale.

• The mean takes each and every score into account.

• It also the most stable index of central tendency and thus yields the most reliable estimate of the central tendency of the population.

Page 23: Data Display and Summary

23

Measures of Dispersion

Quantitative indices that describe the spread of a data set. These are

• Range

• Mean deviation

• Variance

• Standard deviation

• Coefficient of variation

• Percentile

Page 24: Data Display and Summary

24

Range

It is difference between highest and lowest values in a data series. For example:

the ages (in Years) of 10 children are

2, 6, 8, 10, 11, 14, 1, 6, 9, 15

here the range of age will be 15 – 1 = 14 years

Page 25: Data Display and Summary

25

Mean Deviation This is average deviation of all observation from

the mean -

Mean Deviation = ∑ І X – X І _______

_ n here X = Value, X = Mean n = Total no. of value

Page 26: Data Display and Summary

Mean Deviation ExampleA student took 5 exams in a class and had scores of 92, 75, 95, 90, and 98. Find the mean deviation for

her test scores.• First step find the mean. _

x = ∑ x ___ n

= 92+75+95+90+98

5

= 450

5

= 90

26

Page 27: Data Display and Summary

Dr. Riaz A. Bhutto 279/3/2012

Values = X ˉ Mean = X

Deviation from

ˉ Mean = X - X

Absolute value ofDeviationIgnoring + signs

92 90 2 2

75 90 -15 15

95 90 5 5

90 90 0 0

98 90 8 8

Total = 450

n = 5 Mean Deviation =

_ ∑І X – X І _______ = 30/5 n

--∑ X - X = 30

= 6

Average deviation from mean is 6

• 2nd step find mean deviation

Page 28: Data Display and Summary

28

Variance

• It is measure of variability which takes into account the difference between each observation and mean.

• The variance is the sum of the squared deviations from the mean divided by the number of values in the series minus 1.

• Sample variance is s² and population

variance is σ²

Page 29: Data Display and Summary

29

Variance (cont.)

• The Variance is defined as:

• The average of the squared differences from the Mean.

• To calculate the variance follow these steps:

• Work out the Mean (the simple average of the numbers)

• Then for each number: subtract the Mean and square the result (the squared difference)

• Then work out the average of those squared differences.

Page 30: Data Display and Summary

Dr. Riaz A. Bhutto 9/3/2012

30

Step 1

Step 2 Step 3

Step 4

Values = X ˉ Mean = X

Deviation from

ˉ Mean = X - X

ˉ ( X – X)²

2 4 -2 4

5 4 1 1

4 4 0 0

6 4 2 4

3 4 -1 1

Step 6 =

s² =_ ∑ ( X – X)² _______ = 10/5

n

= 2

∑ = 10

Step 5

S²= 2 persons²

Example: House hold size of 5 families was recorded as following: 2, 5, 4, 6, 3 Calculate variance for above data.

Page 31: Data Display and Summary

31

Standard Deviation

• The Standard Deviation is a measure of how spread out numbers are.

• Its symbol is σ (the greek letter sigma)

• The formula is easy: it is the square root of the Variance.i-e  s = √ s²

• SD is most useful measure of dispersion

s = √ (x - x²) n (if n > 30) Population

s = √ (x - x²) n-1 (if n < 30) Sample

Page 32: Data Display and Summary

Standard Deviation and Standard Error

• SD is an estimate of the variability of the observations or it is sample estimate of population parameter .

• SE is a measure of precision of an estimate of a population parameter.

Page 33: Data Display and Summary

33

Graphs and their use

• Histogram & Box plots are used for continuous or scale variables like temperature, Bone density etc.

• Bar chart & Pie Charts are used to categorical or nominal variables like gender, name etc.

• Scatterplots . Used to measure to continuous variables.

Page 34: Data Display and Summary

34

BAR GRAPHS.

• Bar graphs are frequently used with the categorical data to compare the sizes of categories

Page 35: Data Display and Summary

353/3/2012

Page 36: Data Display and Summary

36

PIE CHARTS

• Like bar graphs, pie charts are best used with categorical data to help us see what percentage of the whole each category constitutes. Pie charts require all categories to be included in a graph. Each graph always represents the whole. 

• One of the reasons why bar graphs are more flexible than pie charts is the fact that bar graphs compare selected categories, whereas pie charts must either compare all categories or none.

Page 37: Data Display and Summary

37

Page 38: Data Display and Summary

38

QUANTITATIVE VARIABLES

• STEM PLOTS.

• Stemplots (sometimes called stem-and-leaf plots) are used with quantitative data to display shapes of distributions, to organize numbers and make them more comprehensible.

• It is a descriptive technique which gives a good overall impression of the data. Stemplots include the actual numerical values of the observations, where each value is separated into two parts, a stem and a leaf.

• A stem is usually the first digit, or the leftmost digit(s), and a leaf is the final rightmost digit. We write the stems in a vertical column with the smallest at the top, and draw a vertical line to the right of the column. Finally, we write the leaves in the row to the right of the corresponding stem, starting with the smallest one.

Page 39: Data Display and Summary

39

STEM PLOTS.

• Grades. The average test grades of 19 students are as follows (on a scale from 0 to 100, with 100 being the highest score): 92 95 96 81 95 75 91 79 92 100 89 94 92 86 93 73 74 94 91

• Colour coordinated, in increasing order: 

• 73, 74, 75, 79, 81, 86, 89, 91, 91, 92, 92, 92, 93, 94, 94, 95, 95, 96, 100

Page 40: Data Display and Summary

40

STEMPLOT#1: stem | leaf       7 | 3 4      7 | 5 9      8 | 1      8 | 6 9       9 | 1 1 2 2 2 3 4 4      9 | 5 5 6    10 | 0    10 |  

STEMPLOT#2:stem | leaf     7 | 3 4 5 9      8 | 1 6 9      9 | 1 1 2 2 2 3 4 4 5 5 6    10 | 0

Depending on the number of stems, different conclusions can be drawn about a given data set. In this example, even though both stemplots show a slight left-skeweness of the data set, stemplot#1 reflects that more evidently than stemplot #2.

Page 41: Data Display and Summary

41

Stem and Leaf Plots

• .Simple way to order and display a data set.

• Abbreviate the observed data into two significant digits.

Stem Leaf

• 0 6 1 4

• 1 1 3 5

• 2 6 2 0

• 3 2

0.6 2.6 0.1

1.1 0.4 1.3 1.5 2.2 2.0 3.2

Page 42: Data Display and Summary

42

HISTOGRAMS

• Histograms are yet another graphic way of presenting data to show the distribution of the observations. It is one of the most common forms of graphical presentation of a frequency distribution

Page 43: Data Display and Summary

43

Page 44: Data Display and Summary

44

BOXPLOTS

• Boxplots reveal the main features of a batch of data, i.e. how the data are spread out.

• Any boxplot is a graph of the five-number summary: the minimum score, first quartile (Q1-the median of the lower half of all scores), the median, third quartile (Q3-the median of the upper half of all scores), and the maximum score, with suspected outliers plotted individually.

Page 45: Data Display and Summary

45

Continued ( Explainable from Graph)

• The boxplot consists of a rectangular box, which represents the middle half of all scores (between Q1 and Q3). Approximately one-fourth of the values should fall between the minimum and Q1, and approximately one-fourth should fall between Q3 and the maximum. A line in the box marks the median. Lines called whiskers extend from the box out to the minimum and maximum scores that are not possible outliers. If an observation falls more than 1.5x IQR outside of the box, it is plotted individually as an outlier. 

Page 46: Data Display and Summary

46

BOXPLOTS

• FIVE-NUMBER SUMMARY:

•   MINIMUM

• 1ST QUARTILE

• MEDIAN

• 3RD QUARTILE

• MAXIMUM

Page 47: Data Display and Summary

47

IQR, or the interquartile range, is the distance between the first and third

quartiles. IQR = Q3 - Q1

Page 48: Data Display and Summary

48

References

• https://onlinecourses.science.psu.edu/stat100/book/export/html/20

• http://www.gla.ac.uk/sums/users/jdbmcdonald/PrePost_TTest/confid2.html

Page 49: Data Display and Summary

Dr. Riaz A. Bhutto 49

ANY QUESTIONS

•THANK YOU

3/3/2012