Section 2.1 – What Are the Types of...

30
Section 2.1 – What Are the Types of Data?

Transcript of Section 2.1 – What Are the Types of...

Section 2.1 – What Are the Types of Data?

Variables �  A variable is any characteristic that is recorded

for subjects in a study. The word “variable” highlights that data values vary. ◦  Note that we studied several characteristics when we

completed the class study. Some examples are: �  Gender �  Class year �  GPA

◦  In general, you can think of a variable as being a survey question.

Categorical vs. Quantitative �  Categorical variables are such that each

observation belongs to one of a set of categories. ◦  Gender? ◦  Others?

�  Quantitative variables are variables such that each observation can take on numerical values that represent different magnitudes of the variable. ◦  What is your height? ◦  Others?

Descriptive Statistics �  How do we numerically summarize survey

data? First ask yourself if the data comes from a categorical or quantitative variable.

◦  If the data comes from a categorical variable, we’ll

want to describe the relative number of observations in each category (e.g. percentages.) ◦  If the data comes from a quantitative variable, key

features to describe are the center (e.g. mean/median) and spread (quartiles/standard deviation).

Quantitative Variables are Discrete or Continuous �  A quantitative variable is discrete if its

possible values form a set of separate numbers, such as 0, 1, 2, etc. ◦  How many siblings do you have? ◦  Others?

�  A quantitative variable is continuous if its possible values form an interval. ◦  What is your height? ◦  Others?

Frequency Tables �  A frequency table is a listing of possible values for each

variable, together with the number of observations for each value. ◦  Count how often each variable occurs. ◦  Use percentages or proportions (relative frequencies)

to summarize the data.

�  Make frequency tables for some categorical variables.

Gender Count Proportion Percent

Female

Male

Total

Frequency Table

Contingency Table

Male Female TOTALS

Democrat

Republican

Independent

Other

TOTALS

Section 2.2 - How Can We Describe Data Using Graphical Summaries? The type of graph that you should choose depends on whether the variable is quantitative or categorical.

9

Graphs for Categorical Data �  Pie Chart: A circle having a “slice of pie”

for each category. ◦  The size of the slices is determined by the

proportional size of each category. ◦  Labeling wedges with percents helps to

make the information clearer.

10

Pie Chart

Graphs for Categorical Data � Bar Graph: A graph that displays a vertical bar for each category ◦  The heights of the bars are determined by

the proportional size of each category. ◦  The bars do not touch. ◦  If you order the bars from highest to

lowest frequency it is known as a Pareto chart.

12

13

Bar Chart

Pareto Bar Chart

14

Pie Chart vs. Bar Chart �  Pie chart gives a quick picture of parts

of the whole. �  Difficult compare small differences in

a pie chart. �  Bar graphs are good for comparing

two groups and a particular variable.

15

Graphs for Quantitative Data

�  Dot Plot: Shows a dot for each observation. To construct: ◦  Label a horizontal line for the variable that you are measuring and mark it with regular values of the variable on it. ◦ For each observation, draw a dot above the line corresponding to the value of the observation.

16

17

Hours of Sleep

Pros and Cons of a Dotplot

�  Gives a quick picture of the data. �  Difficult to graph precisely – better

with discrete than continuous data.

18

Graphs for Quantitative Data �  Stem-and-Leaf Plot: Portrays the

individual observations on branches ◦  Each observation is represented by a stem

and a leaf. Typically all of the digits except the last are the stem and the last one is the leaf.

◦  Example: GPA �  What is the range of answers? �  What values should the stems be? �  How about the leaves?

19

GPA Stem-and-Leaf Plot

20

Variable: GPA Decimal point is at the colon. Leaf unit = 0.1 2 : 00001444 2 : 556677788899 3 : 0111222344444 3 : 6888 4 : 00

Pros and Cons of a Stem-and-Leaf

�  The leaves are ordered and therefore it is difficult to sketch for large data sets.

�  You see a value for every single observation. �  You decide how to distribute the digits and

possibly split the stems. �  Nice to do side-by-side stem-and-leaf plots

to compare two groups.

21

Graphs for Quantitative Data �  Histogram: Uses bars to portray the

frequencies or relative frequencies of the data. ◦  Divide the range of values into equally

sized sections. ◦  For each observation place a tally mark in

the corresponding section. ◦  Use the tally marks to set the heights of

the bars in your histogram. �  Use either counts or proportions.

22

23

Extra Curricular Hours

Pros and Cons of a Histogram

�  Changing the width of the bars can give you dramatically different graphs.

�  Obscures individual data values. �  Helps to clarify the shape of the data.

24

Which Graph Should You Choose to Graph Quantitative Data? � Dot-plot and stem-and-leaf plot: ◦ More useful for small data sets ◦ Data values are retained

� Histogram ◦ More useful for large data sets ◦ Most compact display ◦ More flexibility in defining intervals

25

Recap of Types of Graphs � Categorical ◦ Pie Chart ◦ Bar Graph

� Quantitative ◦ Dotplot ◦ Stem-and-Leaf Plot ◦ Histogram

26

Shape of a Distribution of Quantitative Variables

� Overall Pattern of the Graph: ◦ Clusters? ◦ Outliers? ◦ Symmetric? ◦ Skewed? ◦ Unimodal? ◦ Bimodal?

27

Symmetric or Skewed ?

28

29

Study Hours

30

GPA