QT1 - 02 - Frequency Distribution

download QT1 - 02 - Frequency Distribution

If you can't read please download the document

Transcript of QT1 - 02 - Frequency Distribution

Tables and Graphs

Frequency Distributions

QUANTTECHINTEUQIASEVIT10S

Contents

Basics of Data

Samples and Populations

Data Array

Frequency Distributions

Relative Frequency Distributions

Classes

Qualitative versus Quantitative

Discreet versus Continuous

Illustrating Data

Histograms

Polygons

Data Basics

Data are collections of any number or related observations

Number of telephones installed by all workers in one day

Number of telephones installed by one worker in one day

Number of tourists in Finland on every Diwali day ??

Data is useful when they

Reveal some kind of pattern

Temperature in December is less than that in June

Lead to some logical conclusion

Senior citizens avoid investing in equity markets

Data Collection & Sanity Check

Source of Data

Actual observation in the field

Physical records available with source organisation

Third party data sources

Commercial data sellers

Free data sources available in the web

Basic Sanity Check

Is the source trustworthy ?

Is there something missing in the data ?

Do we have enough observations ?

Is the conclusion logical ? Garbage In Garbage Out

Is there double counting ?

Samples and Populations

Population : is a collection of all elements about whom we are trying to draw conclusions

Women in Calcutta with age > 18

Sample : is a collection of some, not all, elements of the population about whom we are in a position to gather data

Statisticians gather data from a sample and then use this data to draw inferences about the population

Representative Sample : it should reflect the characteristics of the underlying population

Selecting a sample from of women from Calcutta Club may not be representative of all women in Calcutta !

Organising Data

Organising data enables us to quickly spot some of the characteristics of the data

Range : Highest Value ? Lowest Value ?

Clustering : Are the values grouped around a specific value ?

Popularity : Which value occurs most frequently

Ways of organising data

Simple ascending or descending order

Group by certain characteristic

Age ? Income ? Education Level ?

Colour ? Material ?

Examples of Raw Data
Retail Sales Figures

Examples of Raw Data
Forbes500 Company Data

Examples of Raw Data
US Cereals Data

Examples of Raw Data
Stockmarket Price Data

Examples of Raw Data
Examination Marks

Examples of Raw Data
Engine Pollution Data

Data Array

The Data Array arranges values in ascending or descending order

Why Create a Data Array ?

We can quickly get the highest and lowest value

In Hydrocarbon : 0.34 .. 1.1

We can divide the data into sections

First 1/3 : Between 0.34 and 0.46

Second 1/3 : Between 0.47 and 0.56

Last 1/3 : Between 0.56 and 1.1

We can see whether some value appears multiple times

We can observe difference between successive values of the data

Limitations of Data Array

Cumbersome to use when the volume of data is very large

Utility goes down as human mind cannot comprehend so much data in one shot

There is a need to compress this data and make it more accessible

Frequency Distribution

A frequency distribution is a table that organises data into classes

A class is a group of values describing ONE characteristic of the data

It shows the number of observations from the data that fall into each class

Frequency distribution can be constructed by determining how often ('with what frequency') values occur inside each class of a data set

Fewer classes mean more data compression

Frequency Distribution

Relative Frequency Distribution

Frequency of each value can be expressed as a fraction or percentage of the total number of observations

This could help us compare data from samples that are of different sizes

Discrete & Continuous Classes

DISCRETE : In this case, the data in a class can take ONE discrete value :

0, 1, 2, ...

CONTINUOUS : In this case, the data in a class can take any value in a range

> 0; 1; 2; ) Lower Class Boundary

Less Than OR Equal to (