Sta301 lec01

73
Virtual University of Pakistan Lecture No. 1 Statistics and Probability Miss Saleha Naghmi Habibullah

description

plz run

Transcript of Sta301 lec01

Page 1: Sta301 lec01

Virtual University of Pakistan

Lecture No. 1 Statistics and Probability

Miss Saleha Naghmi Habibullah

Page 2: Sta301 lec01

Objective

• To inculcate in you an attitude of Statistical and Probabilistic thinking.

• To give you some very basic techniques in order to apply Statistical analysis to real-world situations/problems.

Page 3: Sta301 lec01

That science which enables us to draw conclusions about various phenomena on the basis of real data collected on sample-basisA tool for data-based researchAlso known as Quantitative AnalysisAny scientific enquiry in which you would like to base your conclusions and decisions on real-life data, you need to employ statistical techniques!Now a days, in the developed countries of the world, there is an active movement for of Statistical Literacy.

WHAT IS STATISTICS?

Page 4: Sta301 lec01

Application Areas

A lot of application in a wide variety ofdisciplines … Agriculture, Anthropology, Astronomy,Biology, Economics, Engineering,Environment, Geology, Genetics, Medicine,Physics, Psychology, Sociology, Zoology ….Virtually every single subject fromAnthropology to Zoology …. A to Z!

Page 5: Sta301 lec01

DESCRIPTIVE STATISTICS

STATISTICS

INFERENTIAL STATISTICS

THE NATURE OF DISCIPLINE

Page 6: Sta301 lec01

The primary text-book for the course is Introduction to Statistical Theory (Sixth Edition) by Sher Muhammad Chaudhry and Shahid Kamal published by Ilmi Kitab Khana, Lahore. Reference books for the course are:1. “ “ by Afzal Beg & Miraj Din Mirza.2. “ “ by Mohammad Rauf Chaudhry (Polymer Publications, Urdu Bazar, Lahore).3. “Statistics” by James T. McClave & Frank H. Dietrich, II (Dellen Publishing Company, California, U.S.A).4. “Introducing Statistics” by K.A. Yeomans (Penguin Books Ltd., England).5. “Applied Statistics” by K.A. Yeomans (Penguin Books Ltd., England).6. “Business Statistics for Management & Economics” by Wayne W. Daniel and James C. Terrell (Houghton Mifflin Company, U.S.A.).7. “Basic Business Statistics” by Berenson & Levine ( )

Text and Reference Material

Page 7: Sta301 lec01

IN ACCORDANCE WITH THE ABOVE-MENTIONED STRUCTURE, THE ORGANIZATION OF THIS COURSE IS AS FOLLOWS:

WEEKSLEC-

TURES

AREA TO BE

COVERED

HOME-WORK

ASSIGN-MENTS

EXAMS

1 TO 5 1 TO 15DESCRIPTIVE STATISTICS

1 TO 5MID-TERM-

I

6 TO 10 16 TO 30 PROBABILITY 6 TO 10MID-TERM-

II

11 TO 15 31 TO 45INFERENTIAL

STATISTICS11 TO 15

FINAL EXAM

ORGANIZATION OF THIS COURSE

Page 8: Sta301 lec01

•Appreciate the nature of statistical data.•Understand various methods of collecting statistical data. •Appreciate the importance of a proper sampling procedure.•Utilize various methods of summarizing and describing collected data.•Employ statistical techniques to understand the nature of relationship between two quantitative variables.

Upon completion of the first segment, you will be able to:

Page 9: Sta301 lec01

•Understand the basic concepts of probability theory (which is the foundation of statistical inference). Understand the concept of discrete probability distributions and their mathematical properties.•Understand the concept of continuous probability distributions and their mathematical properties. •Get acquainted with some of the most commonly encountered and important discrete and continuous probability distributions such as the binomial and the normal distribution.

Upon completion of the second segment, you will be able to:

Page 10: Sta301 lec01

Understand and employ various techniques of estimation and hypothesis-testing in order to draw reliable conclusions necessary for decision-making in various fields of human activity.

Through this segment, you will be able to appreciate the purpose and the goal of the subject of Statistics.

Upon completion of the third segment, you will be able to:

Page 11: Sta301 lec01

There will be two term exams and one final exam. In addition, there will be 15 homework assignments. The final examination will be comprehensive in nature. (Approximately 25-30% of the final exam paper will be on the course covered upto the Mid-Term-II Exam.) These will contribute the following percentages to the final grade:

Mid-Term-I: 20%Mid-Term-II: 20%Final Exam: 30%

Homework Assignments: 30%

GRADING

Page 12: Sta301 lec01

Meaning of Statistics

Statistics

Meanings

STATUS

Political State

Information useful for the State

Page 13: Sta301 lec01
Page 14: Sta301 lec01
Page 15: Sta301 lec01
Page 16: Sta301 lec01
Page 17: Sta301 lec01
Page 18: Sta301 lec01

The word “data” appears in many contexts and frequently is used in ordinary conversation. Although the word carries something of an aura of scientific mystique, its meaning is quite simple and mundane.

It is Latin for “those that are given” (the singular form is “datum”). Data may therefore be thought of as the results of observation.

The meaning of Data

Page 19: Sta301 lec01

Data are collected in many aspects of everyday life. • Statements given to a police officer or physician or psychologist during an interview are data. • The correct and incorrect answers given by a student on a final examination. • Almost any athletic event produces data. • The time required by a runner to complete a marathon,• The number of errors committed by a baseball team in nine innings of play.

EXAMPLES OF DATA

Page 20: Sta301 lec01
Page 21: Sta301 lec01
Page 22: Sta301 lec01
Page 23: Sta301 lec01

EXAMPLES OF DATA

• And, of course, data are obtained in the course of scientific inquiry:

• The positions of artifacts and fossils in an archaeological site,

• The number of interactions between two members of an animal colony during a period of observation,

• The spectral composition of light emitted by a star.

Page 24: Sta301 lec01

Types of Data

Data

Quantitative(Numeric)

Qualitative(Non - Numeric)

Page 25: Sta301 lec01

Variable

A quantity that, varies from an individual to

individual. Variable

Quantitative(Numeric)

Qualitative(Non - Numeric)

Page 26: Sta301 lec01

In statistics, an observation often means any sort of numerical recording of information, whether it is a physical measurement such as height or weight; a classification such as heads or tails, or an answer to a question such as yes or no.Variable:

A characteristic that varies with an individual or an object, is called a variable. For example, age is a variable as it varies from person to person. A variable can assume a number of values. The given set of all possible values from which the variable takes on a value is called its Domain. If for a given problem, the domain of a variable contains only one value, then the variable is referred to as a constant.

OBSERVATIONS AND VARIABLES

Page 27: Sta301 lec01

Variables may be classified into quantitative and qualitative according to the form of the characteristic of interest.

A variable is called a quantitative variable when a characteristic can be expressed numerically such as age, weight, income or number of children.

On the other hand, if the characteristic is non-numerical such as education, sex, eye-colour, quality, intelligence, poverty, satisfaction, etc. the variable is referred to as a qualitative variable. A qualitative characteristic is also called an attribute.

An individual or an object with such a characteristic can be counted or enumerated after having been assigned to one of the several mutually exclusive classes or categories.

QUANTITATIVE & QUALITATIVE VARIABLES

Page 28: Sta301 lec01

Variable

Variable

Quantitative(Numeric)

Qualitative(Non - Numeric)

Continuous Discrete

Page 29: Sta301 lec01

Continuous Variable

Continuous Variable

MeasurementHeight, Weight etc

Page 30: Sta301 lec01

Discrete Variable

Discrete Variable

Countinge.g. No. of sisters

Gaps, Jumps

Page 31: Sta301 lec01

A quantitative variable may be classified as discrete or continuous. A discrete variable is one that can take only a discrete set of integers or whole numbers, that is, the values are taken by jumps or breaks. A discrete variable represents count data such as the number of persons in a family, the number of rooms in a house, the number of deaths in an accident, the income of an individual, etc.

A variable is called a continuous variable if it can take on any value-fractional or integral––within a given interval, i.e. its domain is an interval with all possible values without gaps. A continuous variable represents measurement data such as the age of a person, the height of a plant, the weight of a commodity, the temperature at a place, etc.

A variable whether countable or measurable, is generally denoted by some symbol such as X or Y and Xi or Xj represents the ith or jth value of the variable. The subscript i or j is replaced by a number such as 1,2,3, … when referred to a particular value.

DISCRETE AND CONTINUOUS VARIABLES:

Page 32: Sta301 lec01

Measurement Scales

Measurement Scales

Nominal ScaleOrdinal Scale

Interval Scale Ratio Scale

Page 33: Sta301 lec01

By measurement, we usually mean the assigning of number to observations or objects and scaling is a process of measuring. The four scales of measurements are briefly mentioned below:

NOMINAL SCALEThe classification or grouping of the observations into mutually

exclusive qualitative categories or classes is said to constitute a nominal scale. For example, students are classified as male and female. Number 1 and 2 may also be used to identify these two categories. Similarly, rainfall may be classified as heavy moderate and light. We may use number 1, 2 and 3 to denote the three classes of rainfall. The numbers when they are used only to identify the categories of the given scale, carry no numerical significance and there is no particular order for the grouping.

MEASUREMENT SCALES

Page 34: Sta301 lec01

MEASUREMENT SCALES (Cont.)

ORDINAL OR RANKING SCALE

It includes the characteristic of a nominal scale and in addition has the property of ordering or ranking of measurements. For example, the performance of students (or players) is rated as excellent, good fair or poor, etc. Number 1, 2, 3, 4 etc. are also used to indicate ranks. The only relation that holds between any pair of categories is that of “greater than” (or more preferred).

Page 35: Sta301 lec01

INTERVAL SCALEA measurement scale possessing a constant interval size

(distance) but not a true zero point, is called an interval scale. Temperature measured on either the Celcius or the Fahrenheit scale is an outstanding example of interval scale because the same difference exists between 20o C (68o F) and 30o C (86o F) as between 5o C (41o F) and 15o C (59o F). It cannot be said that a temperature of 40 degrees is twice as hot as a temperature of 20 degree, i.e. the ratio 40/20 has no meaning. The arithmetic operation of addition, subtraction, etc. are meaningful.

RATIO SCALEIt is a special kind of an interval scale where the sale of

measurement has a true zero point as its origin. The ratio scale is used to measure weight, volume, distance, money, etc. The, key to differentiating interval and ratio scale is that the zero point is meaningful for ratio scale.

MEASUREMENT SCALES (Cont.)

Page 36: Sta301 lec01

Example

Chemical and manufacturing plants

sometimes discharge toxic-waste materials

such as DDT into nearby rivers and streams

These toxins can adversely affect the plants

and animals inhabiting the river and the river

bank.

Page 37: Sta301 lec01

A study of fish was conducted in the Tennessee River in Alabama and its three tributary creeks: Flint creek, Limestone creek and Spring creek.

A total of 144 fish were captured, and the following variable measured for each one:

Page 38: Sta301 lec01

1. River/Creek from where fish was captured

2. Species of fish (Channel fish, Largemouth bass or smallmouth buffalo fish)

3. Length of fish (Centimeters)

4. Weight of fish (grams)

5. DDT concentration in the bodily system of the fish (parts per million)

Page 39: Sta301 lec01

Classify each of the five variables measured

as quantitative or qualitative.

Also, identify the types of measurement

scales for each of the five variables.

Page 40: Sta301 lec01

Solution

The variables Length, weight and DDT concentration are quantitative variablesbecause each is measured on a nominalscale (Length is centimeters, Weight isgrams and DDT in parts per million).

All three of these variables are beingmeasured on the Ratio Scale.

Page 41: Sta301 lec01

Rationale

Whenever we speak about the weight of an

object, obviously, if our measuring instrument

reads ‘zero’, this means that the object being

measured has zero weight --- and, in this sense,

the ‘zero’ would be a true zero.

An exactly similar argument holds for the length of

an object.

Page 42: Sta301 lec01

As far as DDT concentration in the bodily

system of the fish is concerned, obviously, if

there is absolutely no DDT in the fish, then

the DDT concentration reads zero --- and,

this particular ‘zero’ reading will be true zero.

Page 43: Sta301 lec01

As, explained above, the three variables length of fish, weight of fish and DDT concentration in the bodily system of the fish are quantitative variables measures on the ratio scale.

In contrast:

Page 44: Sta301 lec01

Data on River/Creek from which the fish

were captured, and the species of fish are

qualitative data.

Both of these variables are measured on

Nominal Scale.

Page 45: Sta301 lec01

Rationale

The river/creek from which the fishwere captured, and the species of fish arequalitative data because these can not be measured quantitatively, they can only beclassified into categories.(i.e. Channel fish, Largemouth bass or

smallmouth buffalo fish for the species and TennesseeRiver, Flint creek, Limestone creek and Springcreek)

Page 46: Sta301 lec01

The Statistical methods for describing, reporting and analyzing data depend on the type of data measured (i.e. whether data are quantitative or qualitative).

Page 47: Sta301 lec01

Experience has shown that a continuous variable can never be measured with perfect fineness because of certain habits and practices, methods of measurements, instruments used, etc. the measurements are thus always recorded correct to the nearest units and hence are of limited accuracy. The actual or true values are, however, assumed to exist. For example, if a student’s weight is recorded as 60 kg (correct to the nearest kilogram), his true weight in fact lies between 59.5 kg and 60.5 kg, whereas a weight recorded as 60.00 kg means the true weight is known to lie between 59.995 and 60.005 kg. Thus there is a difference, however small it may be between the measured value and the true value. This sort of departure from the true value is technically known as the error of measurement. In other words, if the observed value and the true value of a variable are denoted by x and x + respectively, then the difference (x + ) – x, i.e. is the error. This error involves the unit of measurement of x and is therefore called an absolute error. An absolute error divided by the true value is called the relative error. Thus the relative error, which when multiplied by 100, is percentage error. These errors are independent of the units of measurement of x. It ought to be noted that an error has both magnitude and direction and that the word error in statistics does not mean mistake which is a chance inaccuracy.

ERRORS OF MEASUREMENT

Page 48: Sta301 lec01

Errors of Measurements

Errors of Measurements

Biased Errors

Cumulative ErrorsSystematic Errors

Random Errors

Compensating ErrorsAccidental Errors

Page 49: Sta301 lec01

An error is said to be biased when the observed value is consistently and constantly higher or lower than the true value. Biased errors arise from the personal limitations of the observer, the imperfection in the instruments used or some other conditions which control the measurements. These errors are not revealed by repeating the measurements. They are cumulative in nature, that is, the greater the number of measurements, the greater would be the magnitude of error. They are thus more troublesome. These errors are also called cumulative or systematic errors.

An error, on the other hand, is said to be unbiased when the deviations, i.e. the excesses and defects, from the true value tend to occur equally often. Unbiased errors and revealed when measurements are repeated and they tend to cancel out in the long run. These errors are therefore compensating and are also known as random errors or accidental errors.

BIASED AND RANDOM ERRORS

Page 50: Sta301 lec01

Statistical Inference

A Statistical Inference in an estimate or

prediction or some other generalization

about a population based on information

contained in sample.

That is, we use information contained in

sample to learn about the larger population.

Page 51: Sta301 lec01

Population and Sample

Population:

The collection of all individuals, items or

data under consideration in a statistical

study.

Sample:

That part of the population from which

information is collected.

Page 52: Sta301 lec01

Population and Sample

Population

Sample

Page 53: Sta301 lec01

Five Elements of an Inferencial Statistical Problem: • A population

• One or more variables of interest

• A sample

• An Inference

• A measure of Reliability

Page 54: Sta301 lec01

In order of understand the concept of

Reliability, a very important point to be

understood is that making an inference

about population from the sample is only

part of the story.

We also need to know its reliability --- that is,

how good our inference is.

Page 55: Sta301 lec01

Measure of Reliability

A measure of reliability is a statement

(usually quantified) about the degree of

uncertainty associated with a statistical

inference.

Page 56: Sta301 lec01

The point to be noted is that the only way we can be certain that an inference about population is correct is to include the entire population in our sample.

However, because of resource constraints, (i.e. Insufficient time and/ or money). We usually can not work with whole population, so we base our inference on just a portion of population (i.e. Sample)

Page 57: Sta301 lec01

Consequently, whenever possible, it is

important to determine and report the

reliability of each inference made.

As such, reliability is the fifth element of

statistical inferencial problems.

Page 58: Sta301 lec01

Example

A large paint retailer has had numerous complaints from customers about under-filled paint cans.

As, a result retailer has begun inspecting incoming shipments of paint from suppliers.

Shipments with under-filled problems will be sent back to supplier.

Page 59: Sta301 lec01

A recent shipment contained 2,440 gallon-size cans.

The retailer sampled 50 cans and weighted each on a scale capable of measuring weight to four decimal places.

Properly filled cans weigh 10 pounds.

Page 60: Sta301 lec01

a) Describe a population

b) Describe a variable of interest

c) Describe a sample

d) Describe the Inference

e) Describe a measure of uncertainty of our inference.

Page 61: Sta301 lec01

Solution

a) The population is the set of units of interests to the retailer, which is the shipment of 2,440 cans of paint.

b) The weight of paint cans is the variable, the retailer wishes to evaluate.

Page 62: Sta301 lec01

c) The sample is the subset of population. In this case, it is the 50 cans of paint selected by the retailer.

Page 63: Sta301 lec01

d) The inference of interest involves the

generalization of the information contained in

the sample of paint cans to the population of

paint cans.

Page 64: Sta301 lec01

In particular, Retailer wants to learn about

the content of under-filled problem (if any)

In the population.

This might be accomplished by finding the

average weight of the cans in the sample,

and using it to estimate the average weight

of the cans of population.

Page 65: Sta301 lec01

e) As far as the measure of reliability of our

inference is concerned, the point to be

noted is that, using statistical methods,

we can determine a bound on the

estimation error.

Page 66: Sta301 lec01

Bound on the Estimation Error

This bound is simply a number that our

estimation error (i.e. the difference between

the average weight of sample and average

weight of population of cans) is not likely to

exceed.

Page 67: Sta301 lec01

This bound is a measure of the uncertainty

of our inference, or, in other words, the

reliability of statistical inference.

The crux of the matter is that an inference is

incomplete without a measure of its reliability

Page 68: Sta301 lec01

When the weights of 50 paint cans are used

to estimate the average weight of all the

cans, the estimate will not exactly mirror the

entire population.

For Example:

Page 69: Sta301 lec01

If the sample of 50 cans yields a mean

weight of 9 pounds, it does not follow (nor is

it likely) that the mean weight of population

of can is also exactly 9 pounds.

Page 70: Sta301 lec01

Nevertheless, we can use sound statistical

reasoning to ensure that our sampling

procedure will generate estimate that is

almost certainly within a specified limit of the

true mean weight of all the cans.

Page 71: Sta301 lec01

For example such reasoning might assure us thatthe estimate of the population from the sample isalmost certainly within 1 pound of the actualpopulation mean. The implication is that the actual mean weight of the entire population of the cans is between 9 – 1=8 pounds and 9 +1=10 pounds --- that is, (9 ± 1) pounds. This interval represents the a measure of reliabilityfor the inference.

Page 72: Sta301 lec01

IN TODAY’S LECTURE, YOU LEARNT:

• The nature of the science of Statistics

• The importance of Statistics in various fields

• Some technical concepts such as – The meaning of “data”– Various types of variables– Various types of measurement scales– The concept of errors of measurement

Page 73: Sta301 lec01

IN THE NEXT LECTURE, YOU WILL LEARN:

• Concept of sampling– Random verses non-random sampling– Simple random sampling– A brief introduction to other types of random sampling

• Methods of data collectionIn other words, you will begin your journey in a

subject with reference to which it has been said that “statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write”.