Probability & Statistics Chapter 02 Summarizing and Graphing Data.
-
Upload
rudolf-owen -
Category
Documents
-
view
217 -
download
0
Transcript of Probability & Statistics Chapter 02 Summarizing and Graphing Data.
Probability & Statistics
Chapter 02Summarizing and Graphing Data
Describing Data- 1. Frequency distribution
• Frequency distribution: A grouping of data into categories showing the number of observations in each mutually exclusive category.
The steps for organizing data into a frequency distribution
1. Set up grouping called classes. Decide on the size of the class interval.
2. Tally the raw data or ungrouped data into classes.
3. Count the number of tallies in each class. The number of observation in the is called the class frequency.
• Class: A class is one of the categories into which data can be classified.
• Class interval for a frequency distribution is obtained by subtracting the lower limit of a class from the lower limit of the next class or subtracting the higher limit of a class from the higher limit of the next class
• Class limits: represent the smallest and largest data values that can be included in a class
• The class boundaries are used to separate the classes so that there are no gaps in the frequency distribution
• Class mark (midpoint): A point that divides a class into two equal parts. This is the average between the upper and lower class limits.
EXAMPLE 1
• Prof. Munasinghe is the head of the Department of Industrial Management (IM) and wishes to determine the amount of studying IM students do. He selects a random sample of 30 students and determines the number of hours each student studies per week: 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6,
• Organize the data into a frequency distribution.
Guidelines for Constructing a Frequency Distribution
• To determine the number of classes “2 to the k rule” can be usedselect the smallest integer (k) such that
2k n
n is the number of observations• There should be between 5 and 20 classes.
Guidelines for Constructing a Frequency Distribution
classes of number
value lowest - value highest i
• Determine the class interval based on the number of suggested classes by using the formula:
• The class intervals used in the frequency distribution should be equal.
• Use the computed suggested class interval to construct the frequency distribution. Note: this is a suggested class interval; if the computed class interval is 97, it may be better to use 100.
Guidelines for Constructing a Frequency Distribution
• Avoid open ended classes• Do not have overlapping classes• Count the number of values in each class.
EXAMPLE 1
• 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6
• Number of possible classes: 2k nneed to find a value for k where 2k 30 then k = 5number of possible classes = 5
• Class interval:i = (highest value – lowest value) / number of classesi = (33.8 – 10.3) / 5 = 4.7better to use i = 5
EXAMPLE 1
Classes Tally Frequency
10.3 – 15.2 |||| ||| 8
15.3 – 20.2 |||| |||| | 11
20.3 – 25.2 |||| || 7
25.3 – 30.2 ||| 3
30.3 – 35.2 | 1
Relative Frequency Distribution
• Shows the percentage of the observation in each class
• The relative frequency of a class is obtained by dividing the class frequency by the total frequency.
• This allows us to make statements regarding the number of observations in a single class relative to the entire sample
Relative Frequency Distribution for Exercise 1
Classes Frequency Relative Frequency
10.3 – 15.2 8 8/30 = 26.7%
15.3 – 20.2 11 11/30 = 36.7%
20.3 – 25.2 7 7/30 = 23.3%
25.3 – 30.2 3 3/30 = 10%
30.3 – 35.2 1 1/30 = 3.3%
Cumulative Relative Frequency Distribution
• The cumulative relative frequency distribution express the cumulative frequency of each class relative to the entire sample
Cumulative Relative Frequency Distribution for Exercise 1
Classes Frequency Cumulative Frequency
Cumulative Relative Frequency
10.3 – 15.2 8 8 8/30 = 26.7%
15.3 – 20.2 11 19 19/30 = 63.3%
20.3 – 25.2 7 26 26/30 = 86.7%
25.3 – 30.2 3 29 29/30 = 96.7%
30.3 – 35.2 1 30 30/30 = 100%
Contingency Table
Complaint Origin Reason for complaint
Electrical Mechanical Appearance
During guarantee period
18% 13% 32%
After guarantee period
12% 22% 3%
• Contingency table indicates the number of observations for both variables that fall jointly in each category.
• Example: Distribution of product complaint
Graphic Presentation
• There are three commonly used graphic forms of a frequency distribution – histograms, – frequency polygons, – cumulative frequency distribution (ogive).
Histogram
• A graph in which – the classes are marked on the horizontal axis – the class frequencies on the vertical axis.– the class frequencies are represented by the
heights of the bars– the bars are drawn adjacent to each other.
Histogram for Exercise 1
Class Midpoint
Frequency
12.75 8
17.75 11
22.75 7
27.75 3
32.75 1
Histrogram for study hours of IM student
0
4
8
12
12.75 17.75 22.75 27.75 32.75
Number of hours
Fre
qu
en
cy
Frequency Polygon
• A frequency polygon consists of line segments connecting the points formed by the class midpoint and the class frequency.
Frequency Polygon for Exercise 1
Class Midpoint
Frequency
7.75 0
12.75 8
17.75 11
22.75 7
27.75 3
32.75 1
37.75 0
Frequency Polygon for study hours of IM students
0
2
4
6
8
10
12
7.75 12.75 17.75 22.75 27.75 32.75 37.75
Number of hours
Fre
qu
ency
Cumulative Frequency distribution
• A cumulative frequency distribution (ogive) is used to determine how many or what proportion of the data values are below or above a certain value.
Ogive for Exercise 1
Class Midpoint
FrequencyLess than cumulative frequency
More than cumulative frequency
7.75 0 0 30
12.75 8 8 30
17.75 11 19 22
22.75 7 26 11
27.75 3 29 4
32.75 1 30 1
37.75 0 30 0
Ogive for Exercise 1
Ogive for study hours if IM students
05
1015
20253035
7.75 12.75 17.75 22.75 27.75 32.75 37.75
Number of hours
Fre
qu
ency
Less than cumulative frequency More than cumulative frequency
Bar Chart
CityNumber of
unemployed (in ‘000)
Gampaha 730
Matale 540
Chilaw 670
Colombo 890
Galle 820
Kandy 890
• A bar chart can be used to depict any of the levels of measurement (nominal, ordinal, interval, or ratio).
2-17
EXAMPLE : Construct a bar chart for the number of unemployed people per 100,000 population for selected cities of 1995.
EXAMPLE continued
Unemployment data
0
200
400
600
800
1000
Gampaha Matale Chilaw Colombo Galle Kandy
City
nu
mb
er o
f p
erso
ns
2-18
a bar chart for the number of unemployed people per 100,000 population for selected cities of 1995
Pie Chart
Type of shoe
# of runners
Nike 92
Adidas 49
Reebok 37
Asics 13
• A pie chart is especially useful in displaying a relative frequency distribution. A circle is divided proportionally to the relative frequency and portions of the circle are allocated for the different groups.
• Example: Draw a pie chart based on the following information.
EXAMPLE continued
Pie chart for running shoes
Nike45%
Asics7%
Other5%
Reebok19%
Adidas24%
Type of shoe
# of runners
Nike 92
Adidas 49
Reebok 37
Asics 13
Line Chart
• A graph of data that is mapped by a series of lines. Line charts show changes in data or categories of data over time and can be used to document trends
CityNumber of
unemployed (in ‘000)
Gampaha 730
Matale 540
Chilaw 670
Colombo 890
Galle 820
Kandy 890
EXAMPLE : Construct a bar chart for the number of unemployed people per 100,000 population for selected cities of 1995.
EXAMPLE continued
Unemployment data
0
200
400
600
800
1000
Gampa
ha
Mata
le
Chilaw
Colom
boGall
e
Kandy
City
Nu
mb
er
of
pe
rso
n
Stem-and-Leaf Displays
• Stem-and-Leaf Display: A statistical technique for displaying a set of data. Each numerical value is divided into two parts: the leading digits become the stem and the trailing digits become the leaf.
Note: An advantage of the stem-and-leaf display over a frequency distribution is we do not lose the identity of each observation.
EXAMPLE
stem leaf
6 9
7 8 9
8 2 3 4 5 6 8
9 1 2 6
• Saman achieved the following scores on his twelve accounting quizzes this semester: 86, 79, 92, 84, 69, 88, 91, 83, 96, 78, 82, 85. Construct a stem-and-leaf chart for the data.