STATISTIKA CHATPER 2 (Summarizing and Graphing Data Summarizing and Graphing Data) SULIDAR FITRI,...

Post on 26-Dec-2015

221 views 0 download

Tags:

Transcript of STATISTIKA CHATPER 2 (Summarizing and Graphing Data Summarizing and Graphing Data) SULIDAR FITRI,...

STATISTIKA CHATPER 2 (Summarizing and Graphing

Data Summarizing and Graphing Data)

SULIDAR FITRI, M.Sc

FEBRUARY 20,2013

2-2 Frequency Distributions

2-3 Histograms

2-4 Statistical Graphics

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Introduction To Statistics

Math 13

Essentials of Statistics 3rd edition

by Mario F. Triola

1. Center: An average value that indicates where the middle of the data set is located.

2. Variation: A measure of spread - the amount by which the values vary among themselves.

3. Distribution: The nature (shape) of the distribution of data: bell-shaped, uniform, or skewed.

4. Outliers: Data values that lie very far away from the vast majority of other values.

5. Time: Changing characteristics of the data values over time.

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

OverviewImportant Characteristics of Data

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Section 2-2 Frequency Distributions

Definition Frequency Distribution Table

lists data values (individually or by groups) in one column, and their

corresponding frequencies (counts) in the second column.

Frequencies are denoted: F (for a population) or f (for a sample).

The total sum of the frequencies in all classes must add up to the population size (or the sample size):

NF nf

Frequency Distribution: Ages of Best Actresses

Frequency DistributionOriginal Data

Frequency Distributions

Definitions

Lower Class Limit is the smallest number that can actually belong to a class.

Lower ClassLimits

Upper Class Limit is the largest number that can actually belong to a class.

Upper ClassLimits

Class Widthis the difference between two consecutive lower class limits (or two consecutive upper class limits).

Editor: Substitute Table 2-2

Class Width

10

10

10

10

10

10

are the numbers used to separate classes:

class boundary falls in the middle of the gap created by class limits.

Class Boundaries

Editor: Substitute Table 2-2

ClassBoundaries

20.5

30.5

40.5

50.5

60.5

70.5

80.5

Class Midpointsare the numbers that fall in the middle of each class:

class midpoint can be found as (lower class limit + upper class limit) 2

ClassMidpoints

25.5

35.5

45.5

55.5

65.5

75.5

1. Large data sets can be summarized.

2. We can gain some insight into the nature of data.

3. We have a basis for constructing important graphs.

Reasons for Constructing Frequency Distributions

3. Choose the starting point, which will be the lower limit of the first class.

4. List the lower class limits by adding the calculated class width to the lower limit of the first class.

5. List the upper class limits, which should be one less then the next lower class limit.

6. Enter a count (frequency) of data values in each class.

Constructing A Frequency Distribution1. Decide on the number of classes (best: between 5 and 20).

2. Calculate the class width (round up):

class width (maximum data value) – (minimum data value)

number of classes

Ex. 1Given the ages of the male Oscar award recipients, construct a frequency

distribution table with 6 classes:

Grouping with Equal Class widths

(1) Smallest data value = 29

86

2976

29

Largest data value = 76

(2) Class width =

(3) Let’s choose the starting point to be 29

(4) then the lower class limits are:

37829 45837 53845 61853 69861 77869

Age of Actor

29 – 36

37 – 44

45 – 52

53 – 60

61 – 68

69 – 76

(5) then the upper class limits are:76,68,60,52,44,36

(6) Now, tally the values in eachclass and fill in the 2nd column:

Age of Actor

29 – 36

37 – 44

45 – 52

53 – 60

61 – 68

69 – 76

Solution:

Frequency Distribution: Ages of Best Actors

Age of Actor Frequency, f

29 – 36 15

37 – 44 32

45 – 52 17

53 – 60 8

61 – 68 3

69 – 76 1

n = 76 (total)

Definition Relative Frequency Table

lists the same classes as the Frequency Distribution table in one column, and their corresponding relative frequencies (percentages) in the second column.

Relative Frequency is denoted sometimes p-hat, sometimes p:

and is equal to the percent of the subjects in a class out of the whole sample:

%100 p%100ˆ p

n

fp

Frequency and Relative Frequencydistribution tables:

76 nf

76

28%37

76

30%39

76

12%16

Ex. 2Given the Frequency Distribution table constructed in Example 1, create the

corresponding Relative Frequency Distribution table:

Relative Frequency Distribution:

(1) Calculate the relative frequency in

each class by finding the ratio f/n :

%7.1976

15

Frequency Distribution: Ages of Best Actors

Age of Actor Frequency

29 – 36 15

37 – 44 32

45 – 52 17

53 – 60 8

61 – 68 3

69 – 76 1

n = 76

%1.4276

32 %4.22

76

17

Relative Frequency: Ages of Best Actors

Age of Actor Relative Frequency

29 – 36 19.7%

37 – 44 42.1%

45 – 52 22.4%

53 – 60 10.5%

61 – 68 3.9%

69 – 76 1.3%

99.9% (total)

(2) Write the relative frequency values in the second column:

Definition

Cumulative Frequency Table replaces both columns of the Frequency

Distribution table with cumulative groups and cumulative frequencies.

The cumulative groups of values are found as all the values that are less than each of the lower class limits of the original groups;

Cumulative frequencies in each such cumulative group are found by adding the frequency in the original group to the cumulative frequency in the previous group.

Cumulative Frequency Distribution

CumulativeFrequencies

Frequency Tables

Critical Thinking Interpreting Frequency Distributions

In later chapters, there will be frequent references to data with normal distribution. One key characteristic of a normal distribution is that it has a “bell” shape:– The frequencies start low, then increase to some

maximum frequency, then decrease to a low frequency.

– The frequencies are distributed approximately symmetric, nearly evenly distributed on both sides of the maximum frequency.

Ex. 3Given their corresponding Relative Frequency Distribution tables, make a

comparison of the ages of Oscar-winning actresses and actors:

Comparing Two Samples

(1) actresses tend to be somewhat younger than actors. 37% of the actresses are in the youngest age group while only 4% of the actors fall into that age.

Ages of Best Actresses and Actors

Age Relative Frequency for Actresses

Relative Frequency for Actors

21 – 30 37% 4%

31 – 40 39% 33%

41 – 50 16% 39%

51 – 60 3% 18%

61 – 70 3% 4%

71 – 80 3% 1%

101% 99%

(2) The highest relative frequency for the actresses (39%) corresponds to the age group from 31 to 40; the highest relative frequency for actors (39%) corresponds to the age group from 41 to 50 years old, ten years older that the most frequent age group for actresses.

(3) Neither distribution appears to be normal – data values are not distributed symmetrically on each side of the maximum frequencies.

Recap

In this Section we have discussed

Important characteristics of data: C V D O T

Frequency distributions;

Procedures for constructing frequency distributions;

Relative frequency distributions;

Cumulative frequency distributions;

Comparing samples using their frequency distributions.

CARA MENENTUKAN BANYAK KELASRUMUS STURGES

K = 1 + 3,3 LOG n K: Banyaknya Kelas

N: Jumlah data yang kita miliki

Kasus: Sampel yang berupa penjualan produk suatu

perusahaan terhadap 80 pelanggan. Maka, bagaimana menentukan jumlah kelas yang sesuai???

Answer:

Jumlah data yang dimiliki (n)= 80

Maka:

K = 1 + 3,3 log n = 1 + 3,3 log 80 = 1 + 3,3 (1,9031) = 7,280 … dibulatkan menjadi 7

kelas

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Section 2-3 Histograms

Key ConceptA histogram is an important type of graph that portrays the nature of the distribution:

Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies

A histogram makes visible the nature of the distribution, as well as where its center is and whether there are any outliers:

The shape of the distribution of the ages of Best Actresses is skewed, heavier on the left indicating that actresses who win Oscars tend to be disproportionally younger.

Key Concept

One key characteristic of a normal distribution is that it is “bell-shaped”:

Normal Distribution:

Recap

In this Section we have discussed

Histograms

Relative Frequency Histograms

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Section 2-4 Statistical Graphics

Key Concept

This section presents other graphs beyond histograms commonly used in statistical analysis.

The main objective is to understand a data set by using a suitable graph that is effective in revealing some important characteristic.

Frequency Polygon

Uses line segments connected to points directly above class midpoint values

Ogive

A line graph that depicts cumulative frequencies

Insert figure 2-6 from page 58

Dot Plot

Consists of a graph in which each data value is plotted as a point (or dot) along a scale of values

Stemplot (or Stem-and-Leaf Plot)

Represents data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit)

Pareto ChartA bar graph for qualitative data, with the bars arranged in order according to frequencies:

complaints against phone carriers:

Pie Chart

A graph depicting qualitative data as slices of a pie

Ex. 1

What percent of total complaints corresponds to complaints due to Access Charges?

Pie Chart analysis

(1) Percent is the ratio of the Access Charges complaints out of the total number of complaints, so we need to find the total number of all complaints first: 21086

(2) Now the proportion of the Access Charges complaints is:

%9.2029.021086

614p

(3) Complaints due to Access Charges make up 2.9% of all complaints against phone carriers.

Scatter Plot (or Scatter Diagram)

A plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis: Number of cricket chirps per minute related to the temperature:

Time-Series Graph

Data that have been collected at different points in time

Other Graphs

Recap

In this section we have discussed graphs that are pictures of distributions.

Keep in mind that a graph is a tool for describing, exploring and comparing data.

1. A sample value that lies very far away from the majority of the other sample values is

A. The center.

B. A distribution.

C. An outlier.

D. A variance.

2. A table that lists data values along with their counts is

A. An ogive.

B. A frequency distribution.

C. A cumulative table.

D. A histogram.

3. The smallest numbers that can actually belong to different classes are

A. Upper class limits.

B. Class boundaries.

C. Midpoints.

D. Lower class limits.

4. A bar graph where the horizontal scale represents the classes of data values and the vertical scale represents the frequencies is called

A. A frequency distribution.

B. A histogram.

C. A dot plot.

D. A pie chart.

5. The pie chart below shows the percent of the total population of 12,200 of Springfield inhabitants living in the given types of housing. Find the number of people who live in single family housing (to nearest whole number.)

A. 4758 people

B. 39 people .

C. 5368 people

D. 7442 people

Single family 39%

Duplex 2%Townhouse 6%Condo 18%

Apartments 35%

6. Berdasarkan table distribusi frekuensi yang telah kalian buat sebelumnya (sertakan table tersebut dalam lembar jawaban apa adanya !)a. Tentukan jumlah kelas yang sesuai dengan

menggunakan rumus Sturges!b. Buatlah graphic berdasarkan table distribusi

frekuensi anda:Polygon untuk frekuensi !Histogram untuk frekuensi relatif !Ogive untuk frekuensi kumulatif !

ANSWER

A sample value that lies very far away from the majority of the other sample values is

A. The center.

B. A distribution.

C. An outlier.

D. A variance.

A sample value that lies very far away from the majority of the other sample values is

A. The center.

B. A distribution.

C. An outlier.

D. A variance.

A table that lists data values along with their counts is

A. An ogive.

B. A frequency distribution.

C. A cumulative table.

D. A histogram.

A table that lists data values along with their counts is

A. An ogive.

B. A frequency distribution.

C. A cumulative table.

D. A histogram.

The smallest numbers that can actually belong to different classes are

A. Upper class limits.

B. Class boundaries.

C. Midpoints.

D. Lower class limits.

The smallest numbers that can actually belong to different classes are

A. Upper class limits.

B. Class boundaries.

C. Midpoints.

D. Lower class limits.

A bar graph where the horizontal scale represents the classes of data values and the vertical scale represents the frequencies is called

A. A frequency distribution.

B. A histogram.

C. A dot plot.

D. A pie chart.

A bar graph where the horizontal scale represents the classes of data values and the vertical scale represents the frequencies is called

A. A frequency distribution.

B. A histogram.

C. A dot plot.

D. A pie chart.

The pie chart below shows the percent of the total population of 12,200 of Springfield inhabitants living in the given types of housing. Find the number of people who live in single family housing (to nearest whole number.)

A. 4758 people

B. 39 people .

C. 5368 people

D. 7442 people

Single family 39%

Duplex 2%Townhouse 6%

Condo 18%

Apartments 35%

The pie chart below shows the percent of the total population of 12,200 of Springfield living in the given types of housing. Find the number of people who live in single family housing (round to nearest whole number.)

A. 4758 people

B. 39 people .

C. 5368 people

D. 7442 people

Single family 39%

Duplex 2%Townhouse 6%

Condo 18%

Apartments 35%

Any Queries ?