Quantitative Applications in Management Research E-book

Quantitative Applications in Management and Research

Amity Directorate of Distance & Online Education

Decision-making is an essential and dominating part of the management process. Although authorities sometimes differ in their definitions of the basic functions of management, everybody agrees that one is not a manager unless he has some authority to plan, organise and control the activities of an enterprise and behaviour of the others. Within this context, decision-making may be viewed as the power to determine what plans will be made and how activities will be organized and controlled.

1

Preface

It gives me immense pleasure in bringing out the Students Study Material for the subject Quantitative Applications in Management and Research. The matter is represented in an easy way and covers particularly the need of the desired course. The purpose of the course is to help students acquire the mathematical skills which is required in the field of management, the material is such arranged so as to allow the progressive learning of Quantitative techniques.

2

Index

S.I. Nos. Chapter No. Subject Page No.

1 Chapter 1 Introduction to Quantitative Analysis 3-10

2 Chapter 2 Data Analysis 11-24

3 Chapter 3 Correlation Analysis 25-34

4 Chapter 4 Regression Analysis 35-41

5 Chapter 5 Probability & Probability distribution 42-55

6 Chapter 6 Time Series 55-69

7 Key to End Chapter quizzes 70-71

8 Bibliography 72

3

Chapter-I

Introduction to Quantitative Analysis

Contents:

1.1 Introduction

1.2 Decision - Making and Quantitative Techniques.

1.2.1 Elements of any decision are

1.3 Quantitative Applications in Management- an overview

1.4 Application of Quantitative methods in business & Management

1.4.1 Finance -Budgeting and Investments

1.4.2 Purchasing, Procurement and Exploration

1.4.3 Production Management

1.4.4 Marketing

1.4.5 Personal management

1.4.6 Research and Development

4

Chapter-I Introduction to Quantitative Analysis

1.1 Introduction

Decision-making is an essential and dominating part of the management process. Although

authorities sometimes differ in their definitions of the basic functions of management, everybody

agrees that one is not a manager unless he has some authority to plan, organise and control the

activities of an enterprise and behaviour of the others. Within this context, decision-making may

be viewed as the power to determine what plans will be made and how activities will be organized

and controlled. The right to make decisions is an internal part of right of authority upon which the

entire concept of management rests. Essentially then, decision-making pervades the activities of

every business manager. Further, since to carry out the key managerial functions of planning,

organizing, directing and controlling, the management is engaged in a continuous process of

decision-making pertaining to each of them, we can go to the extent of saying that management

may be regarded as equivalent to decision-making.

Traditionally, decision-making has been considered purely as an art, a talent which is

acquired over a period of time through experience. It has been considered so because a variety

of individual styles can be traced in handling and successfully solving similar type of managerial

problems in actual business. However, the environment in which the management has to operate

nowadays is complex and fast changing. There is a greater need for supplementing the art of

decision-making by systematic and scientific methods. A systematic approach to decision-making

is necessary because today's business and the environment in which it functions are far more

complex than in the past, and the cost of making errors is becoming graver with time. Most of the

business decisions cannot be made simply on the basic of rule of thumb, using commonsense

and / or snap judgment. Commonsense may be misleading and snap judgments may have

painful implications. For large business, a single wrong decision may not only one ruinous but

may also have ramifications in national or even international economies. As such, present day

management's cannot rely solely on a trial and error approach and the managers have to be

more sophisticated. They should employ scientific methods to help them make proper choices.

Thus, the decision makers, in the business world of today must understand scientific

methodology for making decisions.

1.2 Decision - Making and Quantitative Techniques

Managerial decision-making is a process by which management, when faced with a problem,

chooses a specific course of action from a set of possible options. In making a decision, a

business manager attempts to choose that course of action which is most effective in the given

circumstances in attaining the goals of the organization. The various types of decision-making

5

situations that a manager might encounter can be listed as follows.

1. Decisions under certainty where all facts are known fully and for sure or uncertainly where the

event that would actually occur is not known but probabilities can be assigned to various possible

occurrences.

2. Decisions for one time-period only called static decisions, or a sequence of interrelated

decisions made either simultaneously or over several time periods called dynamic decisions.

3. Decisions where the opponent is nature (digging an oil well, for example) or a national

opponent (for instances, setting the advertising strategy when the actions of competitors have to

be considered)

These classes of decisions-making situations are not mutually exclusive and a given situation

would exhibit characteristics from each class. Stocking of an item for sale in a certain trade fair, for

instance, illustrates a static decision making situation where uncertainly exists and nature is the

opponent.

1.2.1 Elements of any decision are:

i. a decision-maker who could be an individual, group, organization, or society;

ii. a set of possible actions that may be taken to solve the decision problem;

iii. a set of possible states that might occur;

iv. a set of consequences (pay-offs) associated with various combinations of courses of action and

the states that may occur; and

v. the relationship between the pay-offs and the values of the decision maker;

In an actual decision-making situation, definition and identification of the alternatives, the states

and the consequences are most difficult, albeit not the most crucial, aspects of the decision

problem.

In real life, some decision-making situations are simple while others are not. Complexities in

decision situations arise due to several factors. These include the complicated manner of interaction of the

economic, political, technological, environmental and competitive forces in society, the limited resources of

an organization; the values, risk attitudes and knowledge of the decision-makers and the like. For example, a

company's decision to introduce a new product will be influenced by such considerations as market

conditions, labour rates and availability, and investment requirements and availability of funds. The decision

will be of multidimensional response, including the production methodology, cost and quality of the product,

price, package design, and marketing and advertising strategy. The results of the decision would conceivably

6

affect every segment of the organisation. The essential idea of the quantitative approach to decision-making

is that if the factors that influence the decisions can be identified and quantified then it becomes easier to

resolve the complexity of the decision-making situations. Thus, in dealing with complex problems, we may

use the tools of quantitative analysis. In fact, a large number of business problems have been given a

quantitative representation with varying degrees of success and it has led to a general approach which is

variably designated as operations research (for operational research), management science, systems

analysis, decision analysis, decision science, etc. Quantitative analysis is now extended to several areas of

business operations and represents probably the most effective approach to handling of some types of

decision problems.

A significant benefit of attaining some degree of proficiency with quantitative methods is exhibited in the way

the problems are perceived and formulated. A problem has to be well defined before it can be formulated

into a well-structured framework for solution. This requires an orderly and organised way of thinking.

Two observations may be made here. First, it should be understood clearly that a decision by itself

does not become a good and right decision for adoption merely because it is made within an orderly and

mathematically precise framework. Quantification at best is an aid to business judgment and not its

substitute. A certain degree of constructive skepticism is as desirable in considering a quantitative analysis

of business decisions as it is in any other process of decision-making. Further, some allowances should be

made for qualitative factors involving morale, motivation, leadership, etc. which cannot be ignored. But they

should not be allowed to dominate to such an extent that the quantitative analysis may look to be an

interesting academic exercise, but worthless. In fact, the manager should seek some balance between

quantitative and qualitative factors. Should, it may be noted that the various names for quantitative analysis;

operations research, management science, etc. cannot more or less the same general approach. We shall

not attempt to discuss the differences among the various labels as it is prone to create more heat than light,

but only state that the basic reason for so many titles is that the field is relatively new and there is not

consensus regarding which field of knowledge it includes.

1.3 Quantitative Applications in Management- an overview

The objective of quantitative research is to develop and employ mathematical models, theories and/or

hypotheses pertaining to natural phenomena. The process of measurement is central to quantitative

research because it provides the fundamental connection between empirical observation and mathematical

expression of quantitative relationships.

Quantitative research is generally approached using scientific methods, which include:

i. The generation of models, theories and hypotheses

ii. The development of instruments and methods for measurement

7

iii. Experimental control and manipulation of variables

iv. Collection of empirical data

v. Modeling and analysis of data

vi. Evaluation of results

Quantitative methods are research techniques that are used to gather quantitative data - information

dealing with numbers and anything that is measurable. Statistics, tables and graphs, are often used to

present the results of these methods.

1.4 Application of Quantitative methods in business & Management

The tools and techniques of Quantitative Techniques used in areas of management decision making

can be outlined as follows:

1.4.1 Finance -Budgeting and Investments

i. Cash-flow analysis, long range capital requirement, dividend policies, investments portfolios.

ii. Credit policies, credit risks and delinquent account procedures.

iii. Claim and complaint procedures.

1.4.2 Purchasing, Procurement and Exploration

i. Rules for buying, supplies and stable or varying prices.

ii. Determination of quantities and timing of purchases.

iii. Bidding policies.

iv. Strategies for exploration and exploitation of raw material sources.

v. Replacements policies.

1.4.3 Production Management

i. Physical distribution

a) Location and size of warehouses, distribution centers and retail outlets.

b) Distribution policy.

ii. Facilities Planning a) Numbers and location of factories, warehouses, hospitals, etc.

b) Loading and unloading facilities for railroads and trucks determining the transport schedule.

iii. Manufacturing a) Production, scheduling and sequencing.

b) Stabilisation of production and employment training, layoffs and optimum product mix.

iv. Maintenance and Project scheduling a) Maintenance policies and preventive maintenance.

b) Maintenance crew sizes.

c) Project scheduling and allocation of resources.

8

1.4.4 Marketing

i. Product selection, timing, competitive actions.

ii. Number of salesman, frequency of calling on accounts per cent of time spent on prospects.

iii. Advertising media with respect to cost and time.

1.4.5 Personal management

i. Selection of suitable personnel on minimum salary.

ii. Mixes of age and skills.

iii. Recruitment policies and assignment of jobs.

1.4.6 Research and Development

i. Determination of the areas of concentration of research and development.

ii. Project selection.

iii. Determination of time cost trade-off and control of development projects.

iv. Reliability and alternative design.

9

Chapter-I Introduction to Quantitative Analysis

End Chapter quizzes : I

Ques 1. Traditionally, decision-making has been considered purely as an

a. Art b. Science c. Social Science d. Mathematics

Ques 2. Managerial decision-making is a process by which management, chooses a specific course of action from a set of

a. Restricted options b. Possible options. c. No options d. None

Ques 3. Decisions for one time-period only called

a. dynamic decisions b. static decisions c. Both d. None

Ques 4. Decision Making can be done under

a. Certainty b. Uncertainty c. Both d. None

Ques 5. Decision-maker could be

a. an individual b. group c. society d. All the above

Ques 6. Quantitative research is generally approached using scientific methods, which include:

a. The generation of models, theories and hypotheses b. Experimental control and manipulation of variables c. Modeling and analysis of data d. All the above

10

Ques 7. Quantitative research provides the fundamental connection between

a. empirical observation and mathematical expression b. empirical observation and qualitative expression c. empirical observation and social expression d. empirical observation and all expression

Ques 8. Numbers and location of factories, warehouses, hospitals, etc comes under

a. Maintenance and Project scheduling b. Purchasing, Procurement and Exploration c. Facilities Planning d. Physical distribution Ques 9. Selection of suitable personnel on minimum salary

a. Production Management b. Personal management c. Research and Development d. Finance -Budgeting and Investments

Ques 10. Most of the business decisions can be made on the basic of

a. Rule of thumb b. Commonsense c. Snap judgment. d. Quantitative Techniques

11

Chapter-II

Data Analysis

Contents:

2.1 Introduction

2.1.1 Types of Data

2.2 Some Definitions

2.3 Frequency Distribution:

2.3.1 Graphical presentation of Frequency distribution

2.4 Measure of Central tendency

2.4.1 Arithmetic Mean

2.4.2 Median

2.4.3 Mode

2.5 Measure of Dispersion

2.5.1 Range

2.5.2 Mean Deviation

2.5.3 Variance and standard deviation

2.5.4 The Coefficient of Variation

12

Chapter-II Data Analysis

2.1 Introduction

Statistics is a branch of applied mathematics concerned with the collection and interpretation of quantitative data and

the use of probability theory to estimate population parametersStatistical methods can be used to summarize or

describe a collection of data; this is called descriptive statistics.

Data: A collection of values to be used for statistical analysis.

A dictionary defines data as facts or figures from which conclusions may be drawn. Data may consist of

numbers, words, or images, particularly as measurements or observations of a set of variables. Data are often

viewed as a lowest level of abstraction from which information and knowledge are derived. Thus, technically, it is a

collective or plural noun.

Datum is the singular form of the noun data. Data can be classified as either numeric or nonnumeric. Specific terms

are used as follows:

2.1.1 Types of Data

I.I Qualitative data are nonnumeric.

1. {Poor, Fair, Good, Better, Best}, colors (ignoring any physical causes), and types of material {straw, sticks, bricks} are examples of qualitative data.

2. Qualitative data are often termed categorical data. Some books use the terms individual and variable to

reference the objects and characteristics described by a set of data. They also stress the importance of exact

definitions of these variables, including what units they are recorded in. The reason the data were collected

is also important.

II Quantitative data are numeric.

Quantitative data are further classified as either discrete or continuous.

Discrete data are numeric data that have a finite number of possible values.

A classic example of discrete data is a finite subset of the counting numbers, {1,2,3,4,5} perhaps corresponding to {Strongly Disagree Strongly Agree}.

When data represent counts, they are discrete. An example might be how many students were absent on a given day. ac on a given day. Counts are usually considered exact and integer.

Continuous data have infinite possibilities: 1.4, 1.41, 1.414, 1.4142, 1.141421...

The real numbers are continuous with no gaps or interruptions. Physically measureable quantities of length,

volume, time, mass, etc. are generally considered continuous. At the physical level (microscopically), especially

13

for mass, this may not be true, but for normal life situations is a valid assumption.

Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful

information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and

approaches, encompassing diverse techniques under a variety of names, in different business, science, and social

science domains.

2.2 Some Definitions

Raw Data: Data collected in original form. Frequency: The number of times a certain value or class of values occurs. Frequency Distribution: The organization of raw data in table form with classes and frequencies. Categorical Frequency Distribution: A frequency distribution in which the data is only nominal or ordinal. Ungrouped Frequency Distribution: A frequency distribution of numerical data. The raw data is not grouped. Grouped Frequency Distribution: A frequency distribution where several numbers are grouped into one class. Class Limits: Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next. Class Boundaries: Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit. Class Width: The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class. Class Mark (Midpoint): The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two. Cumulative Frequency: The number of values less than the upper class boundary for the current class. This is a running total of the frequencies. Relative Frequency: The frequency divided by the total frequency. This gives the percent of values falling in that class. Cumulative Relative Frequency (Relative Cumulative Frequency): The running total of the relative frequencies or the cumulative frequency divided by the total frequency, gives the percent of the values which are less than the upper class boundary.

2.3 Frequency Distribution

The distribution of empirical data is called a frequency distribution and consists of a count of the number of

occurrences of each value. If the data are continuous, then a grouped frequency distribution is used. Typically, a

distribution is portrayed using a frequency polygon or a histogram. Mathematical distributions are often used to

define distributions. The normal distribution is, perhaps, the best known example. Many empirical distributions are

14

approximated well by mathematical distributions such as the normal distribution.

Grouped Frequency Distribution A grouped frequency distribution is a frequency distribution in which

frequencies are displayed for ranges of data rather than for individual values. For example, the distribution of heights

might be calculated by defining one-inch ranges. The frequency of individuals with various heights rounded off to the

nearest inch would be then be tabulated.

2.3.1 Graphical presentation of Frequency distribution:

Histogram

A histogram is a graphical display of tabulated frequencies. A histogram is the graphical version of a table that shows

what proportion of cases fall into each of several or many specified categories.

Figure 2.1: Histogram

Example of a histogram of 100 values

Advantages

Visually strong

Can compare to normal curve

Usually vertical axis is a frequency count of items falling into each category

Disadvantages

Cannot read exact values because data is grouped into categories

More difficult to compare two data sets

Use only with continuous data

Frequency Polygons

Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same

purpose as histograms, but are especially helpful in comparing sets of data. Frequency polygons are also a good

choice for displaying cumulative frequency distributions.

15

To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-axis

representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it

with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point

in the middle of each class interval at the height corresponding to its frequency. Finally, connect the points. You

should include one class interval below the lowest value in your data and one above the highest value. The graph will

then touch the X-axis on both sides.

Figure 2.2: Histogram/Frequency Polygons

Advantages

Visually appealing

Can compare to normal curve

Can compare two data sets

Disadvantages

Anchors at both ends may imply zero as data points


Frequency Curve

A smooth curve which corresponds to the limiting case of a histogram computed for a frequency distribution

of a continuous distribution as the number of data points becomes very large.

16

Figure 2.3 : Histogram/Frequency Polygons/Frequency Curve

Advantages

Visually appealing

Disadvantages

Anchors at both ends may imply zero as data points


2.4 Measure of Central tendency

Central Tendency is the center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median and mode. The center of a distribution could be defined three ways:

1. the point on which a distribution would balance, 2. the value whose average absolute deviation from all the other values is minimized, and

3. the value whose squared difference from all the other values is minimized.

From the simulation in this chapter, you discovered (we hope) that the mean is the point on which a distribution

would balance, the median is the value that minimizes the sum of absolute deviations, and the mean is the value that

minimizes the sum of the squared values.

2.4.1 Arithmetic Mean

The arithmetic mean is the most common measure of central tendency. For a data set, the mean is the sum of the

observations divided by the number of observations. Basically, the mean describes the central location of the data.

For a given set of data, where the observations are x1, x2,.,xi ; the Arithmetic Mean is defined as :

The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population

with different sample sizes:

17

Example 1:

Observations 12 15 20 22 30

Weights 2 5 7 6 1

Find the mean.

Observations Weights xiwi

Mean =401

/21 =19.10

12 2 24

15 5 75

20 7 140

22 6 132

30 1 30

Total 21 404

Advantages

can be specified using and equation, and therefore can be manipulated algebraically

is the most sufficient of the three estimators

is the most efficient of the three estimators

is unbiased

Disadvantages

is very sensitive to extreme scores (i.e., low resistance)

value is unlikely to be one of the actual data points

requires an interval scale

anything else about the distribution that wed want to convey to someone if we were describing it to them?

2.4.2 Median

The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest

value and picking the middle one. If there is an even number of observations, the median is not unique, so one often

takes the mean of the two middle values.

For Odd number of observations:

Median = (n+1)/2 th

observations.

18

For Even number of observations:

Median = Average of (n/2) th

and (n/2 + 1) th observations.

Here are the sample test scores you have seen so often:

100, 100, 99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45

The "middle" score of this group could easily be seen as 87. Why? Exactly half of the scores lie above 87 and half lie

below it. Thus, 87 is in the middle of this set of scores. This score is known as the median.

In this example, there are 21 scores. The eleventh score in the ordered set is the median score (87), because ten

scores are on either side of it.

If there were an even number of scores, say 20, the median would fall halfway between the tenth and eleventh

scores in the ordered set. We would find it by adding the two scores (the tenth and eleventh scores) together and

dividing by two.

Advantages

is unbiased

is unaffected by extreme scores (i.e., high resistance)

doesnt require the use of an interval scale, as long as you can order the scores along some continuum

then you can find the median

Disadvantage

can not be specified using an equation so cant be manipulated algebraically

is the least sufficient of the three estimators

is less efficient than the mean

2.4.3 Mode

The mode is the most frequently occurring value. It is the most common value in a distribution: The mode of 3, 4,

4, 5, 5, 5, 8 is 5. Note that the mode may be very different from the mean and the median.

With continuous data such as response time measured to many decimals, the frequency of each value is one

since no two scores will be exactly the same. Therefore the mode of continuous data is normally computed from

a grouped frequency distribution. The grouped frequency distribution table shows a grouped frequency

distribution for the target response time data. Since the interval with the highest frequency is 600-700, the mode

is the middle of that interval (650).

19

Table 2.1: Grouped frequency distribution

Range Frequency

500-600 3

600-700 6

700-800 5

800-900 5

900-1000 0

1000-1100 1

Advantages

represents a number that actually occurred in the data

represents the largest number of scores, and so the probability of getting that score is greater then the

probability of getting any of the other scores if an observation is just chosen at random is unaffected by

extreme scores (i.e., high resistance)

is unbiased

doesnt require an interval scale

Disadvantages

the mode depends on how we group the data

can not be specified using an equation so cant be manipulated algebraically

is less sufficient than the mean

is less efficient than the mean

2.5 Measure of Dispersion

Measures of Dispersion provide us with a summary of how much the points in our data set vary, e.g. how spread out

they are or how volatile they are.

In measuring dispersion, it is necessary to know the amount of variation and the degree of variation. The

former is designated as absolute measures if dispersion and expressed in the denomination of original variants while

the latter is designated as related measures of dispersion.

Absolute measures can be divided into positional measures based on some items of the series such as (I)

Range, (ii) Quartile deviation or semi interquartile range and those which are based on all items in series such as

(I) Mean deviation, (ii) Standard deviation. The relative measures in each of the above cases are called the

20

coefficients of the respective measures. For purposes of comparison between two or more series with varying size or

number of items, varying central values or units of calculation, only relatives measures can be used.

The following are the important methods of studying variation:

1. Range

2. Mean deviation

3. Standard deviation and Variance (which is closely related to standard deviation)

4. The Coefficient of Variation

2.5.1 Range

Range is the simplest of the summary measures of variation .It is also the crudest and most prone to error .It is

computed as the difference between the largest and the smallest value in a data set:

Range = H- L

Absolute range H - L Relative range; Coefficient of range = =

Sum of the two extremes H + L

For example, for the data set {2, 2, 3, 4, 14}

Range = 14-2=12

14 2 12 Coefficient of range = = = 0.75 14 + 2 16

Example:

You are given the following data:

3 6 9 11

Compute the sample range

Solution:

H = 11, L = 3

range = H - L = 11 - 3 = 8

2.5.2 Mean Deviation

Mean Deviation can be calculated from any value of Central Tendency, viz. Mean, Median, Mode. Accordingly, Mean

Deviation can be of the following types:

Mean Deviation about Mean

21

Mean Deviation about Median

Mean Deviation about Mode

Mean Deviation about Mean =

Properties of Mean Deviation about Mean:-

The average absolute deviation from the mean is less than or equal to the Standard Deviation.

The mean deviation of any data set from its mean is always zero.

The mean absolute deviation is the average absolute deviation from the mean and is a common measure of

Forecast Error or Time Series Analysis.

For example, for the data set {2, 2, 3, 4, 14}:

Measure of central tendency Absolute deviation

Mean = 5

| 2 - 5| + | 2 - 5| +| 3 - 5| + | 4 - 5| + | 14 - 5| = 3.6

5

2.5.3 Variance and standard deviation

Variance and standard deviation are the most common of all of the measures of variation

Variance is a measure of statistical dispersion, indicating how its possible values are spread around the mean. Thus,

variance indicates the variability of the values. A smaller value implies a smaller variation from the mean

The positive square root of Variance is called the Standard Deviation.

Let us consider an example:

Values Xi - Mean(x) [Xi - XMean]2

4 -1 1

22

6 1 1

5 0 0

5 0 0

Total =20 , mean=5 2

Variance = .2 =1/2

S.D =

2.5.4 The Coefficient of Variation

The Coefficient of Variance is a measure of variation expressed as a percentage the sample mean:

CV = S . 100 Xmean

23

Chapter-II Data Analysis

End Chapter quizzes: II

Ques 1. Singular form of the data is

a. Datum b. Stratum c. Date d. Data

Ques 2. Graphical presentation of Frequency distribution can be done by

a. Histogram b. Frequency polygons c. Frequency Curve d. All the three

Ques 3. Which one is unaffected by extreme scores

a. Mean b. Median c. Mode d. Range

Ques 4.Which one is not the Measure of Dispersion

a. Range b. Mean deviation c. Histogram d. Standard deviation

Ques 5.Chaya took 7 math tests in one marking period. What is the range of her test scores?

89, 73, 84, 91, 87, 77, 94

a. 25 b. 21 c. 13 d. 15

Ques 6.In a crash test, 11 cars were tested to determine what impact speed was required to obtain minimal bumper damage. Find the mode of the speeds given in miles per hour below.

24, 15, 18, 20, 18, 22, 20, 26, 18, 26, 24

a. 18 b. 20 c. 18.6 d. 15

24

Ques 7. A survey conducted by an automobile company showed the number of cars per household and the corresponding probabilities. Find the standard deviation.

Number of cars X 1 2 3 4

Probability P(X) 0.32 0.51 0.12 0.05

a. 4.24 b. 0.63 c. 0.79 d. 1.9

Ques 8. The given data shows the number of burgers sold at a bakery in the last 14 weeks. 17, 13, 18, 17, 13, 16, 18, 19, 17, 13, 16, 18, 20, 19 Find the median number of burgers sold.

a. 18.5 b. 17 c. 18 d. 17.5

Ques 9.Histograms can be constructed for

a. Discrete data b. Continuous data c. Both d. none

Ques 10.Which is called positional average

a. Mean b. Median c. Mode d. None

25

Chapter-III

Correlation Analysis

Contents:

3.1 Introduction

3.2 Types of Correlation

3.2.1 Positive and Negative

3.2.2 Simple, partial and multiple

3.2.3 Linear and non-linear

3.3 Degrees of Correlation

3.3.1 Perfect correlation

3.3.2 Limited degrees of correlation

3.3.3 Absence of correlation

3.4 Methods of Determining Correlation

3.4.1 Scatter Plot

3.4.2 Karl Pearsons coefficient of correlation

3.4.3 Spearmans Rank-correlation coefficient

26

Chapter-III Correlation Analysis

3.1 Introduction

Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For

example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't

perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter

one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of

people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of

the variation in peoples' weights is related to their heights.

Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect

there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater

understanding of your data.

3.2 Types of Correlation

I. Positive and Negative

II. Simple, partial and multiple

III. Linear and non-linear

3.2.1 Positive and Negative Correlation

Positive Correlation

If the higher scores on X are generally paired with the higher scores on Y, and the lower scores on X are

generally paired with the lower scores on Y, then the direction of the correlation between two variables is

positive.

Negative Correlation

If the higher scores on X are generally paired with the lower scores on Y, and the lower scores on X are

generally paired with the higher scores on Y, then the direction of the correlation between two variables is

negative.

Figure: 3.1 Positive, Negative and No Correlation

27

3.2.2 Simple, partial and multiple

The distinction between simple, partial and multiple Correlation is based upon the number of variables studied

Simple Correlation

Correlation between only two variables, e.g. Correlation between age and height, correlation between yield of

rice and amount of rainfall in a given area are examples of Simple Correlation

Multiple Correlation

When correlation between three or more variables are studied simultaneously, then it is called multiple

Correlation

Partial Correlation

In this we recognize more than two variables but consider only two variables to be influencing each other, the

effect of other influencing variable being kept constant. The correlation between the two variables keeping the

other variables constant is called partial correlation

1 X1-Yield of rice

2 X2-Amount of Rainfall

3 X3-Amount of fertilizers

4 X4-Type of soil

5 X5-Advanced technologies used.

Correlation analysis of X1, X2, X3, X4 and X5 is an example of Multiple Correlation whereas if we only

study the relation between X1 and X2 keeping other variables constant it would be an example of Partial

Correlation between yield of rice and amount of rainfall.

28

3.2.3 Linear and non-linear

The nature of the graph gives us the idea of the linear type of correlation between two variables. If the graph is in

a straight line, the correlation is called a "linear correlation" and if the graph is not in a straight line, the correlation

is non-linear or curvi-linear

3.3 Degrees of Correlation

3.3.1 Perfect correlation

If two variables changes in the same direction and in the same proportion, the correlation between the two is

perfect positive. According to Karl Pearson the coefficient of correlation in this case is +1. On the other hand if

the variables change in the opposite direction and in the same proportion, the correlation is perfect negative. its

coefficient of correlation is -1. In practice we rarely come across these types of correlations.

3.3.2 Limited degrees of correlation

If two variables are not perfectly correlated or is there a perfect absence of correlation, then we term the

correlation as Limited correlation. It may be positive, negative or zero but lies with the limits 1.

3.3.3 Absence of correlation

If two series of two variables exhibit no relations between them or change in variable does not lead to a change

in the other variable, then we can firmly say that there is no correlation or absurd correlation between the two

variables. In such a case the coefficient of correlation is 0.

Table: 3.1 Meaning of (r) in the Correlation Coefficient

Relationship Between X and Y

r = + 1.0 Strong - Positive As X goes up, Y always also goes up

r = + 0.5 Weak - Positive As X goes up, Y tends to usually also go

up

r = 0 - No Correlation - X and Y are not correlated

r = - 0.5 Weak - Negative As X goes up, Y tends to usually go down

r = - 1.0 Strong - Negative As X goes up, Y always goes down

3.4 Methods of Determining Correlation

1 Scatter Plot

2 Karl Pearsons coefficient of correlation

3 Spearmans Rank-correlation coefficient.

29

3.4.1 Scatter Plot (Scatter diagram or dot diagram)

In this method the values of the two variables are plotted on a graph paper. One is taken along the horizontal ((x-

axis) and the other along the vertical (y-axis). By plotting the data, we get points (dots) on the graph which are

generally scattered and hence the name Scatter Plot.

The manner in which these points are scattered, suggest the degree and the direction of correlation. The

degree of correlation is denoted by r and its direction is given by the signs positive and negative.

Figure: 3.2 Positive, Negative and No Correlation

positive correlation negative correlation no correlation

3.4.2 Karl Pearsons coefficient of correlation

It gives the numerical expression for the measure of correlation. It is noted by r . The value of r gives the

magnitude of correlation and sign denotes its direction. It is defined as

r = . Cov (x,y) .

(Var x .Var y)

Table: 3.2 Correlation coefficient between advertisement expenditure(X) and sales (Y)

X (Rs. akhs) Y (Rs. crore) (X- X Mean)2 (Y- Y Mean)

2 (X - X Mean) (Y -Y Mean)

4 16 0.1849 1.6641 0.5547

6 29 2.4649 137.124 18.4789

10 43 31.0249 661.004 143.2047

5 20 0.3249 5.7100 1.5447

1 3 11.7649 204.204 49.0147

2 4 5.9049 176.624 32.2947

3 6 2.0449 127.464 16.1447

X =31 Y =121 (X- X Mean)2

=53.7143

(Y- Y Mean)2

=1310.794

(X - X Mean) (Y -Y Mean)

=261.2371

30

X Mean = 4.43 and Y Mean = 17.29

Sum of squared deviations in advertisement expenditure = 53.71

Sum of squared deviations of sales = 1310.79

Sum of cross products (SP) = 261.24

Calculation of the Pearson r

r = 261.24 = 261.24 .

(53.71) (1310.79) 70402.53

r = (261.24) / (265.33) = +0.985

Interpretation

The magnitude of the correlation between advertisement expenditure and sales = 0.985. The direction of the relationship is positive. As the advertisement expenditure increases so does the sales of the commodity.

3.4.3 Spearmans Rank-correlation coefficient

The most precise way to compare several pairs of data is to use a statistical test - this establishes whether the

correlation is really significant or if it could have been the result of chance alone.

Spearmans Rank correlation coefficient is a technique which can be used to summarise the strength and

direction (negative or positive) of a relationship between two variables.

The result will always be between 1 and minus 1.

Method - calculating the coefficient

Create a table from your data.

Rank the two data sets. Ranking is achieved by giving the ranking '1' to the biggest number in a column, '2'

to the second biggest value and so on. The smallest value in the column will get the lowest ranking. This

should be done for both sets of measurements.

Tied scores are given the mean (average) rank. For example, the three tied scores of 1 euro in the example

below are ranked fifth in order of price, but occupy three positions (fifth, sixth and seventh) in a ranking

hierarchy of ten. The mean rank in this case is calculated as (5+6+7) 3 = 6.

Find the difference in the ranks (d): This is the difference between the ranks of the two values on each row

of the table. The rank of the second value (price) is subtracted from the rank of the first (distance from the

museum).

31

Square the differences (d) To remove negative values and then sum them (d2 ).

Table: 3.3 Spearman's Rank Correlation

Convenience

Store

Distance from

CAM (m)

Rank Price of 50cl

bottle ()

Rank Difference between

the ranks (d)

d2

1 50 10 1.80 2 8 64

2 175 9 1.20 3.5 5.5 30.25

3 270 8 2.00 1 7 49

4 375 7 1.00 6 1 1

5 425 6 1.00 6 0 0

6 580 5 1.20 3.5 1.5 2.25

7 710 4 0.80 9 -5 25

8 790 3 0.60 10 -7 49

9 890 2 1.00 6 -4 16

10 980 1 0.85 8 -7 49

d =

285.5

Calculate the coefficient (R) using the formula below. The answer will always be between 1.0 (a perfect

positive correlation) and -1.0 (a perfect negative correlation).

When written in mathematical notation the Spearman Rank formula looks like this :

Now to put all these values into the formula.

Find the value of all the d2

values by adding up all the values in the Difference2 column. In our example this is 285.5.

Multiplying this by 6 gives 1713.

Now for the bottom line of the equation. The value n is the number of sites at which you took measurements. This, in our example is 10. Substituting these values into n - n we get 1000 - 10

We now have the formula: R = 1 - (1713/990) which gives a value for R: 1 - 1.73 = 0 -0.73.

32

What does this R value of -0.73 mean

The R value of -0.73 suggests a fairly strong negative relationship.

33

Chapter-III Correlation

End Chapter quizzes: III

Ques.1. If the higher scores on X are paired with the lower scores on Y then the correlation between two variables is

a. Positive b. Negative. c. No correlation d. Unknown

Ques.2. The value of r gives the magnitude of correlation and sign denotes its

a. Value b. Direction c. Both d. None

Ques.3. When correlation between three or more variables are studied simultaneously, then it is called

a. Simple Correlation b. Partial Correlation c. multiple Correlation d. All of the above

Ques.4. If the graph between two variables gives a straight line, the correlation is called a

a. linear correlation b. Curvi linear correlation c. Absence of correlation d. Simple correlation

Ques.5. If two variables changes in the same direction and in the same proportion, the correlation between the two is

a. Perfect negative b. Perfect positive c. Limited positive d. Limited Negative

Ques.6.The correlation coefficient, r = 0, implies

a. Perfect negative b. Perfect positive c. No correlation d. Limited correlation

Ques.7 Which of the following is a stronger correlation than -.54? a. 0 b. -.45 c. .45 d. -.67

34

Ques.8 If the correlation between body weight and annual income were high and positive, we could conclude that:

(a) High incomes cause people to eat more food. (b) Low incomes cause people to eat less food. (c) High income people tend to spend a greater proportion of their income on food than low income people,

on average. (d) High income people tend to be heavier than low income people, on average.

Ques.9 Men tend to marry women who are slightly younger than themselves. Suppose that every man married a woman who was exactly .5 of a year younger than themselves. Which of the following is CORRECT? (a) The correlation is -.5. (b) The correlation is .5. (c) The correlation is 1. (d) The correlation is -1.

Ques.10. National consumer magazine reported the following correlations. The correlation between car weight and car reliability is -0.30. The correlation between car weight and annual maintenance cost is 0.20.

Which of the following statements are true? I. Heavier cars tend to be less reliable. II. Heavier cars tend to cost more to maintain. III. Car weight is related more strongly to reliability than to maintenance cost.

a. I only b. II only c. III only d. I, II, and III

35

Chapter-IV Regression Analysis

Contents: 4.1 Introduction 4.2 Regression Equations 4.3 How to Find the Regression Equation 4.4 Properties of the Regression coefficients 4.5 Difference between Correlation and Regression

36

Chapter-IV Regression Analysis

4.1 Introduction

Regression analysis is a technique used for the modeling and analysis of numerical data consisting of values of a

dependent variable (response variable) and of one or more independent variable (explanatory variables). The

dependent variable in the regression equation is modeled as a function of the independent variables,

corresponding parameters ("constants"), and an error term. The error term is treated as a random variable. It

represents unexplained variation in the dependent variable. The parameters are estimated so as to give a "best

fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have

also been used.

There are two types of variables in Regression Analysis.

1 Dependent variable

2 Independent variable

Dependent variable is also known as regressed or predicted or explained variable .Independent variable is also

known as regressor or predictor or explainer

Simple regression is used to examine the relationship between one dependent and one independent variable. After

performing an analysis, the regression statistics can be used to predict the dependent variable when the independent

variable is known. Regression goes beyond correlation by adding prediction capabilities.

The regression line (known as the least squares line) is a plot of the expected value of the dependent variable for

all values of the independent variable. Technically, it is the line that "minimizes the squared residuals". The

regression line is the one that best fits the data on a scatterplot.

In the regression equation, if y is the dependent variable and x is the independent variable. Here are three

equivalent ways to mathematically describe a linear regression model.

1 y = intercept + (slope x) + error

2 y = constant + (coefficient x) + error

3 y = a + b x + e

The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. It is expressed in

the units of the Y-axis divided by the units of the X-axis. If the slope is positive, Y increases as X increases. If the

slope is negative, Y decreases as X increases.

37

Figure: 4.1 Regression line

The Y intercept is the Y value of the line when X equals zero. It defines the elevation of the line.

For two variables X and Y, we will have two regression lines and they show mutual relationship between two

variables. The regression line of Y on X gives the most probable estimate of the values of Y for given values of X

whereas regression line of X on Y gives the most probable estimate of the values of X for given values of Y. Only one

regression line: In case of perfect correlation (r = +1), both the line of regression coincide and we get only one line.

4.2 Regression Equations

Regression Equations are algebraic expressions of the regression lines.

Regression Equation of Y on X

Y=a +b X

According to the principle of least squares, the normal equations for estimating a and b are

Y = Na + b X

XY =a X +b X2

Regression Equation of X on Y

X=a +b Y

According to the principle of least squares, the normal equations for estimating a and b are

X = Na + b Y

XY =a Y +b Y2

Regression Equation from Deviations taken from Arithmetic means of X and Y

Y-YMean =b yx (X-XMean)

byx is the regression coefficient of Y on X

byx = xy .

x2

38

4.3 How to Find the Regression Equation

Five randomly selected students took a math aptitude test before they began their statistics course. The Statistics

Department has three questions.

i. What linear regression equation best predicts statistics performance, based on math aptitude scores?

ii. If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?

iii. How well does the regression equation fit the data?

In the table below, the xi column shows scores on the aptitude test. Similarly, the yi column shows statistics grades.

The last two rows show sums and mean scores that we will use to conduct the regression analysis.

Table: 4.1.

Student xi

yi (xi - x) (yi - y) (xi - x)2 (yi - y)

2 (xi - x)(yi - y)

1 95 85 17 8 289 64 136

2 85

95 7 18 49 324 126

3 80 70 2 -7 4 49 -14

4 70 65 -8 -12 64 144 96

5 60 70 -18 -7 324 49 126

Mean 390 385 730 630 470

The regression equation is a linear equation of the form:

y-ymean =b yx (x-xmean)

byx is the regression coefficient of y on x

byx = xy = 470 = 0.643836

x2

730

y - 77 = 0 .643836 (x - 78)

y = .643836 x + 26.78082

Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x), perform

the computation, and you have an estimated value (y) for the dependent variable.

39

In our example, the independent variable is the student's score on the aptitude test. The dependent variable

is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade would be:

y = 0.643836 x + 26.78082 =0.643836 x 80 + 26.78082= 26.768 + 51.52 = 78.288

4.4 Properties of the Regression coefficients

1. Correlation Coefficient is the geometric mean between the regression coefficients.

r2 = b yx x b xy

2. If one of the regression coefficients is greater than unity, the other must be less than unity.

b yx 1 < 1

b xy

3. Both the regression coefficients will have the same sign.

4. The Correlation Coefficient will have the same sign as that of regression coefficients.

5. The arithmetic mean of the regression coefficients is greater than the Correlation Coefficient

4.5 Difference between Correlation and Regression

The difference between regression and correlation needs to be emphasised. Both methods attempt to describe the

association between two (or more) variables, and are often confused by students and professional scientists alike!

1 Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not

concerned with the relationship between variables; instead it gives an estimate as to the degree of association

between the variables. In fact, correlation analysis tests for interdependence of the variables.

2 As regression attempts to describe the dependence of a variable on one (or more) explanatory variables; it implicitly

assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless

of whether the path of effect is direct or indirect.

40

Chapter-IV Regression Analysis End Chapter quizzes: IV

Ques.1.In Regression Analysis the dependent variable is also known as

a. Regressed variable b. Regressor variable c. Random variable d. All of the above

Ques.2. Simple regression is used to examine the relationship between

a. two dependent variables b. two independent variables c. one dependent and one independent variable d. two dependent and one independent variable

Ques.3. In Regression Analysis, one regression line is obtained in case if

a. r = +1 b. r = -1 c. r = +1 d. r = 0

Ques.4. byx is the regression coefficient of Y on X

a. byx = xy x

2

b. byx = xy y

2

c. byx = y

2 .

x2

d. byx = x

2 .

xy

Ques.5. If one of the regression coefficients is greater than unity, the other must be

a. greater than unity b. less than unity c. equals to unity d. Not known

Ques.6. Both the regression coefficients will have

a. same sign b. opposite sign c. Not known d. None

Ques.7 If y is the dependent variable and x is the independent variable. Then the linear regression model will

be

41

a. x = a +b y + e b. y = b x c. x = b y d. y = a + b x + e

Ques.8. The arithmetic mean of the regression coefficients is ----------- then the correlation coefficient

a. Smaller b. Greater c. Equals to d. None

Ques.9 A regression equation was computed to be Y = 35 + 6X. The value of 35 indicates that:

a. An increase in one unit of X will result in an increase of 35 in Y b. The coefficient of correlation is 35 c. The coefficient of determination is 35 d. The regression line crosses the Y-axis at 35 Ques.10. After performing an analysis, the regression statistics can be used to predict the dependent variable when the ------------ variable is known

a. Independent b. dependent c. correlation coefficient d. All of the above

42

Chapter-V

Probability & Probability distribution

Contents:

5.1 Introduction

5.1.1 Definition of Probability:

5.1.2. Axioms of Probability

5.1.3. How to Compute Probability:

5.2 Addition Law of theorem

5.3 Multiplication Law of Probability

5.4 Probability Distribution

5.5 Binomial Distribution

5.5.1 Mean of Binomial Distribution

5.6. Poisson Distribution

5.6.1 Mean and variance of Poisson distribution

5.7. Normal Distribution or Normal Curve

5.7.1. Characteristics of Normal Distribution

5.7.2. Empirical Rule

.

43

Chapter-V Probability & Probability distribution

5.1 Introduction

Mathematically, the probability that an event will occur is expressed as a number between 0 and 1. Notationally, the

probability of event A is represented by P (A).

If P (A) equals zero, there is no chance that the event A will occur.

If P (A) is close to zero, there is little likelihood that event A will occur.

If P(A) is close to one, there is a strong chance that event A will occur

If P (A) equals one, event A will definitely occur.

The sum of all possible outcomes in a statistical experiment is equal to one. This means, for example, that if an

experiment can have three possible outcomes (A, B, and C), then

P (A) + P (B) + P(C) = 1.

5.1.1 Definition of Probability

Let an event A can happen in m ways, and fail in n ways where all ways are equally like are likely to occur, then the

probability of the happening of event A is defined as

44

From above, it may be noted P (A) = p is such that 0 P 1. P () = q is called the complementary event. Also 0 q 1

The sum of all possible outcomes in a statistical experiment is equal to one. This means, for example, that if an experiment can

have three possible outcomes (A, B, and C), then P (A) + P (B) + P(C) = 1.

Associated with each event A in S is the probability of A, P (A)

5.1.2. Axioms of Probability

Axioms:

1. P (A) 0

2. P(S) = 1 where S is the sample space

3. P (A U B) = P (A) + P (B) if A and B are mutually exclusive

e.g., P (ace or king) = P (ace) +P (king) =1/13+1/13=2/13.

Theorems about probability can be proved using these axioms and these theorems can be used in probability calculations. P (A) = 1- P () P (A U B) = P (A) + P (B) P (A B) (for mutually not exclusive events) E.g. P (ace or black) = P (ace) + P (black) P(ace and black)= 4/52 + 26/52 2/52 = 28/52 = 7/13

5.4 Some More Definitions:

Here we define and explain certain term which are used frequently.

(i) Trial and Event: Let an experiment be repeated under essentially the same conditions and let it result in any one of the several

possible outcomes. Then the experiment is called a trial and the possible outcomes are known as event or cases. In a throw of a

coin the turning of head or tail is called an event and the throwing of a coin is called a trial.

(ii) Exhaustive events: The total number of all possible outcomes in any trial in known as exhaustive events or exhaustive cases.

In a throw of a coin, the possible outcomes are head and tail i.e., these are two exhaustive cases. In the experiment of rolling a

die, the outcomes 1,2,3,4,5,6(six cases) are exhaustive.

(iii) Favourable events: The events, which entail the required happening, are said to be favourable events. For example in a throw

of die, to have the even number, 2, 4 and 6 are favourable events.

(iv) Mutually exclusive events: Two events are known as mutually exclusive when the occurrence of one of them, excludes the

occurrence of the other, e.g. while tossing a coin, we either get a head or tail but not both.

(v) Independent event: Two event may be independent, when the actual happening of one does not influence in any way the

happening of the other. In throwing two coins at a time, the outcome of one is independent of the sound. But in case a card is

drawn from a pack of well shuffled cards and is not replaced, then the second draw of the card is dependent on the first draw.

45

The second draw is then a dependent event.

(vi) Equally likely events: Two events are said to be equally likely if one of them can not be expected in preference, is called the

to other. For example in a throw of a coin two case i.e. head and tail are equally likely to come.

(vii) Conditional Probability: The probability of happening an event A, such that event B has happened, is called the conditional

probability of happening of A on the condition that B has already happened. It is usually denoted by P (A/B).

5.1.3. How to Compute Probability (Equally Likely Outcomes)

Sometimes, a statistical experiment can have n possible outcomes, each of which is equally likely. Suppose a subset of r

outcomes are classified as "successful" outcomes.

The probability that the experiment results in a successful outcome (S) is:

P(S) = (Number of successful outcomes) / (Total number of equally likely outcomes) = r / n

Consider the following experiment. An urn has 10 marbles. Two marbles are red, three are green, and five are blue. If an

experimenter randomly selects 1 marble from the urn, what is the probability that it will be green?

In this experiment, there are 10 equally likely outcomes, three of which are green marbles. Therefore, the probability of choosing

a green marble is 3/10 or 0.30.

The probability of an event refers to the likelihood that the event will occur

5.2. Addition Law of Probability

If P1, P2, P3, Pn be the probabilities of n mutually excusive events E1, E2, E3, En respectively, then the probability P

that one these events will happen, is given by

p = P1 + P2 + P3, + +Pn

p = P (E1 + E2 +E3, + +En) = P (E1) +P (E2) +P (E3) + +P (En)

5.3 Multiplication Law of Probability

If there are two independent events E1, and E2, the respective probability of which are known, then the probability that both will

happen simultaneously is the product of the probability of one and the conditional provisional probability of the other given that

the first that occurred.

P (AB) = P (A) x P (B).

Note:

(i) E1, and E2, independent events, then P (E2, / E1,) is the same as P (E2,). Then P (E1E2) = P (E1).P (E2). (ii) If P1, P2, P3, Pnbe the probabilities of independent even E1, E2, E3, En respectively then the probability p, that all events happen simultaneously is given by

P = P1.P2 P3 Pn

46

(iii) If P is the probability that an event will happen in one trial, then the probability that it will happen in a succession of r trials

is

= P.P.P..P = Pr

(iv) If P1, P2, P3, Pn be the probabilities that certain events E1, E2, E3, En happen, then the probability they

do not happen at all i.e., they all fail, is q1. q2. q3 qn = (1- p1). (1-p2). (1-pn) Hence the probability in which at least one of these events must happen is given by 1-q1, q2, q3, qn = 1 {( 1- p1). (1-p2). (1-pn)}

5.4 Probability Distribution

When a variable X takes the value x, with probability Pi( i = 1,2,3, ,n), then X in called random variable or stochastic

variable. The value x1, x2, x3, xn of the random variable X with their respective probabilities p1, p2, p3,

pn constitute a probability distribution of the variable X.

Mean Or Expected Value And Variance :Let a random variable X assumes the values x1, x2, x3, xn with respective

probabilities p1, p2, p3, pn, then the mean or expected value of X is defined as

E(X) = = p1x1 + p2x2 + p3x3 + . + pnxn = px.

The variance of the random variable X given by

The can be simplified to a more convenient from

5.5 Binomial Distribution

A random variable X which takes values 0, 1,2,..,n is said to follow a Binomial distribution

if its probability function in given by

P (X = r) = P (r) =cr prq

n-r, r = 0,1,2,,n,

Where p, q>0 such that p + q =1.

Let the probability of the happening of an event A in one trial be p and its probability of not

happening be 1 - p = q.

We assume that there are n trials and the happening of the event A is r times and its not

happening is n - r times.

47

This may shown as follows AAA

..

r times n-r times A indicates its happening, its and P (A) = P and P () = q We see that (1) has the probability

pp..p q.q..q = pr q

n-r

r times n-r times Clearly (1) is merely one order of arranging r As.

(1) .(2)

The probability of (1) = pr q

n-r x Number of different arrangements of

r As and (n-r) s. .

The number of different arrangements of r As and (n-r) s =ncr.

Probability of the happening of an event r times =ncr p

rq

n-r.

= p(r) qn-r

,

(r = 0, 1, 2,., n )

= (r + 1)th term of (q + p)

n.

If r = 0, probability of happening of an event 0 times =

nC0 q

n p

0 = q

n


nC1q

n1p

If r = 2, Probability of happening of an event 2 times =


nC2q

n2p

2

nC3q

n3p

3

and so on.

These terms are clearly the successive terms in the expansion of (q +p) n.

Hence it is called Binomial distributions.

Condition for the Applicability of Binomial Distribution:

While using the formula of the binomial distribution in solving any problem, the following conditions must be satisfied:

(a) There should be a finite number of trials.

(b) The trials do not depend on each other.

(c) Each trial should have only two possible outcomes, either a success or a failure.

(d) The probability of success of failure is the same for all the trials.

5.5.1 Mean of Binomial Distribution

If X is a binomial vitiate with parameters n and p, then

P (X = r) = p(r) =nCr p

rq

n-r, r = 0,1,2,..,n.

49

Example: The probability that a pen manufactured by a company will be defective is 1/10. If 12 such pens are manufactured fine

the probability that (1) exactly two will be defective, (ii) at least two will be defective (iii) none will be defective.

Solution: The probability of defective pen is 1/10=0.1

The probability of a non-defective pen is 1- 0.1= 0.9 Hence n = 12

(i) The probability that exactly two will be defective

= 12

C2 (0.1)2 (0.9)

10 = 0.2301.

(ii) The probability that exactly two will be defective

=1- (prob. That either none or one is non-defective)

=1- [12

C0 (0.9)12

+ 12

C1 (0.1) (0.9)11

] = 0.3412

(iii) The probability that none will be defective

= 12

C0 (0.9)12

= 0.2833.

Example: A die is thrown 8 times and it is required to find the probability that 3 will show (i) Exactly 2 times, (ii) At least seven

times, (iii) At least once.

Solution: The probability of throwing 3 in a single trial = P =1/6

The probability of not throwing 3 in a single trial = q = 5/6

a. P (getting 3, exactly 2 times)= 8C2 q

6p

2 =

(ii) P (getting 3 at least seven times) = P (getting 3, at 7 or 8 times)

= P (7) +P (8) = 8C7 q

1p

7 +

8C8q

0p

8

(iii) P (getting 3 at least once)

= P(getting 3, at 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 times)

=P(1) + P(2) + P(3) + P(4) + P(5) + P(6) + P(7) + P(8)

=1- P (getting 3, at 0 times) = 1- 8C0q

8p

0

=

5.6. Poisson Distribution

The Poisson distribution is generally used when measuring the number of occurrences of something (# of successes) over an

interval or time period.

The assumptions of a Poisson probability distribution are:

50

The probability of the occurrence of an event is constant for all subintervals.

There can be no more than one occurrence in each subinterval.

Occurrences are independent; that is, the number of occurrences in any non-overlapping intervals is independent of

one another.

The random variable X is said to follow the Poisson probability distribution if it has the probability function:

5.6.1 The mean and variance of the Poisson probability distribution are:

x = E(X) = and

x2

= E[(X -x)2 ] =

The Poisson probability distribution is an important discrete probability distribution for a number of applications, including:

1. The number of failures in a large computer system during a given day

2. The number of delivery trucks to arrive at a central warehouse in an hour

3. The number of customers to arrive for flights during each 15-minute time interval from 3:00 PM to 6:00 PM on weekdays

4. The number of customers to arrive at a checkout aisle in your local grocery store during a particular time interval

Example: On an average Friday, a waitress gets no tip from 5 customers. Find the probability that she will get no tip from 7

customers this Friday.

The waitress averages 5 customers that leave no tip on Fridays: = 5.

Random Variable: The number of customers that leave her no tip this Friday.

We are interested in .

So, the probability that 7 customers will leave no tip this Friday is 0.1044.

5.7. Normal Distribution or Normal Curve:

Normal distribution is probably one of the most important and widely used continuous distribution. It is known as a normal random

variable, and its probability distribution is called a normal distribution. The following are the characteristics of the normal

distribution:

51

5.7.1. Characteristics of the Normal Distribution:

1. It is bell shaped and is symmetrical about its mean.

2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.

3. It is a continuous distribution.

4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different normal distribution.

Thus, the normal distribution is completely described by two parameters: mean and standard deviation.

5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is 0.5.

6. It is unimodal, i.e., values mound up only in the center of the curve

A normal distribution in a variate with mean and variance is a statistic distribution with probability density function

on the domain

The Standard normal distribution is given by taking = 0 and 2 =

1 in a general normal distribution. An arbitrary normal

distribution can be converted to a Standard normal distribution by changing variables to , so , yielding

5.7.2. Empirical Rule

All normal density curves satisfy the following property which is often referred to as the Empirical Rule.

68% of the observations fall within 1 standard deviation of the mean, that is, between - and +.

95% of the observations fall within 2 standard deviations of the mean, that is, between - 2 and +2.

99.7% of the observations fall within 3 standard deviations of the mean, that is, between - 3 and +3.

.

Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.

Figure: 5.1Normal Distribution or Normal Curve

52

Example

The total weight of 8 people chosen at random follows a normal distribution with a mean of 550kg and a standard deviation of

150kg.

Whats the probability that the total weight of 8 people exceeds 600kg?

First sketch a diagram.

Figure: 5.1 Normal area curve

The mean is 550kg and we are interested in the area that is greater than 600kg.

z = ( x - xmean ) /

Here x = 600kg,

xmean, the mean = 550kg

, the standard deviation = 150kg

z = ( 600 - 550 ) / 150

z = 50 / 150

z = 0.33

Table: 5.1

53

Look in the table down the left hand column for z = 0.3,

and across under 0.03.

The number in the table is the tail area for z=0.33 which is 0.3707 .

This is the probability that the weight will exceed 600kg.

Our answer is

"The probability that the total weight of 8 people exceeds 600kg is 0.37 correct to 2

figures."

54

Chapter-V Probability & Probability distribution

End Chapter quizzes : V

Ques.1.A coin is tossed three times. What is the probability that it lands on heads exactly one time?

a. 0.125 b. 0.250 c. 0.333 d. 0.375

Ques.2.P(A U B) is the probability that __________ will occur

a. A b. B c. A and B d. A or B or both

Ques.3. The events in an experiment are _____________ if only one can occur at a time

a. mutually exclusive b. non-mutually exclusive c. mutually inclusive d. independent

. Ques.4. A die is rolled, find the probability that an even number is obtained.

a. 1/2 b. 1/3 c. 1/4 d. 1/5

Ques.5. Which of these numbers cannot be a probability?

a 0.00001 b 0.5 c 1.001 d 0

Ques.6. For the normal distribution, the mean plus and minus 1.96 standard deviations will include what

percent of the observations?

a. 80%

b. 84%

c. 90%

d. 95%

Ques.7. Normal distribution is a

a. Discrete distribution b. Continuous distribution c. Both d. None

55

Ques.8. Mean Of Binomial Distribution is given by

a. p b. np c. npq d. n

Ques.9. The probability of happening an event A, such that event B has happened, is called

a. disjoint probability b. independent probability c. conditional probability d. dependent probability

Ques.10. if A and B are mutually exclusive, then P (A U B) =

a. P (A) b. P (A) + P (B) c. P (B) d. P (A) + P (B) - P (A B)

56

Chapter-VI Time Series

Contents: 6.1 Introduction 6.1.1. Role of time Series

6.2. Components of a time series

6.2.1 Secular Trend 6.2.2 Seasonal variation 6.2.3 Cyclical variation 6.2.4 Irregular variation

6.3. Measurement of Trends

6.3.1 Freehand method 6.3.2 The method of semi-averages 6.3.3 The method of moving averages 6.3.4 The method of curve fitting by the Principle of Least Squares

6.4 Mathematical Models

6.4.1 Additive model 6.4.2 Multiplicative model 6.4.3 Mixed models

57

6.1 Introduction

Realization of the fact that "Time is Money" in business activities, the dynamic decision technologies presented here,

have been a necessary tool for applying to a wide range of managerial decisions successfully where time and money

are directly related. In making strategic decisions under uncertainty, we all make forecasts. We may not think that we

are forecasting, but our choices will be directed by our anticipation of results of our actions or inactions.

Indecision and delays are the parents of failure. This site is intended to help managers and administrators do a better

job of anticipating, and hence a better job of managing uncertainty, by using effective forecasting and other predictive

techniques.

A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken

at regular intervals (days, months, years), but the sampling could be irregular.

A time series analysis consists of two steps:

(1) building a model that represents a time series,

(2) using the model to predict (forecast) future values.

The time-series can be represented as a curve that evolves over time. Forecasting the time-series mean that we

extend the historical values into the future where the measurements are not available yet.

There are some subtleties in the definition a time-series forecast. For example, the historical data might be daily

sales and but you need monthly forecasts. Grouping the values according to a certain period (ex: month) is called

time-series.

The following are few examples of time series data:

1. Profits earned by a company for each of the past five years.

2. Workers employed by a company for each of the past 15 years.

3. Number of students registered for the MBA programme of an institute for each of the past five years.

4. The weekly wholesale price index for each of the past 30 weeks.

5. Number of fatal road accidents in Delhi for each day for the past two months.

6.1.1. Role of time Series

1. A time series analysis enables one to study such movements as cycles that fluctuate around the trend. Knowledge of cyclical pattern in certain series of data will be helpful in making generalisations in the concerned business or industry. 2. The analysis of a time series enables us to understand the past behavior or performance. We can know how the data have changed over time and find out the probable reasons responsible for such changes. If the past performance, say, of a company, has been poor, it can take corrective measures to arrest the poor performance.

58

3. A time series analysis helps directly in business planning. A firm can know the long-term trend in the sale of its products. It can find out at what rate sales have been increasing over the years. This may help it in making projections of its sales for the next few years and plan the procurement of raw material, equipment and manpower accordingly. 4. A time series analysis enables one to make meaningful comparisons in two or more series regarding the rate or type of growth. For example, growth in consumption at the national level can be compared with that in the national income over specified period. Such comparisons are of considerable importance to business and industry. 5. A time series analysis helps in evaluating current accomplishments. The actual performance can be compared with the expected performance and the cause of variation analysed e.g. if we know how much is the effect of seasonality on business we may device ways and means of ironing out the seasonal influence or decreasing it by producing commodities with complementary seasons.

6.2. Components of a time series

1 Secular Trend - the smooth long term direction of a time series

2 Seasonal Variation - Patterns of change in a time series within a year which tends to repeat each year

3 Cyclical Variation - the rise and fall of a time series over periods longer than one year

4 Irregular Variation - classified into:

Episodic - unpredictable but identifiable

Residual - also called chance fluctuation and unidentifiable

6.2.1 Secular Trend

With the first type of change, secular trend, the value of the variable tends to increase or decrease over a long period

of time. The steady increase in the cost of living recorded by the Consumer Price Index is an example of secular

trend. From year to individual year, the cost of living varies a great deal, but if we examine a long- term period, we

see that the trend is toward a steady increase. Figure shows a secular trend in an increasing but fluctuating time

series.

Figure: 6.1 Secular trend

59

6.2.2 Seasonal variation

The third kind of change in time-series data is seasonal variation. As we might expect from the name, seasonal

variation involves patterns of change within a year that tend to be repeated from year to year. For example, a

physician can expect a substantial increase in the number of flu cases every winter and of poison in every summer.

Since these are regular patterns, they are useful in forecasting the future. In figure 1(c), we see a seasonal variation.

Notice how it peaks in the fourth quarter of each year.

1 Sales of ice cream will be higher in summer than in winter, and sales of overcoats will be higher in autumn

than in spring.

2 Shops might expect higher sales shortly before Christmas or in their winter and summer sales.

3 Sales might be higher on Friday and Saturday than on Monday.

4 The telephone network may be heavily used at a certain times of the day (such as mid-morning and mid-

afternoon) and much less used at other times (such as in the middle of the night)

Figure: 6.2 Seasonal variation

Seasonal VariationSeasonal Variation

Linear trendLinear trend

4 4

3 3

2 2

1 1

Sa

les

of

Wil

dca

t sa

ilb

oa

tsS

ale

s o

f W

ild

ca

t sa

ilb

oa

ts

(mil

lio

ns

of

do

lla

rs)

(mil

lio

ns

of

do

lla

rs)

|

JulyJuly

20012001

|

JulyJuly

20022002

|

JulyJuly

20032003

|

JulyJuly

20042004

tt

6.2.3 Cyclical variation

The second type of variation seen in a time series in cyclical fluctuation. The most common example of cyclical

fluctuation is the business cycle. Over time, there are years when the business cycle hits a peak above the trend line.

At other times, business activity is likely to slump, hitting a low point below the trend line. The time between bitting

peaks or falling to low points is a least 1 year, and it can be as many as 15 or 20 years. Figure 1(b) illustrates a

typical pattern of cyclical fluctuation above and below a secular trend line. Note that the cyclical movements do not

60

follow any regular pattern but move in a somewhat unpredictable manner.

Figure: 6.3 cyclical variation

Cyclical VariationCyclical Variation

Z1Z1-- DeclineDecline

P1P1-- ProsperityProsperity

V1V1-- DepressionDepression

Z2Z2-- ImprovementImprovement

ZZ11PP11

VV11

ZZ22PP22

VV22

Cyc

lical

act

ivity

Cyc

lical

act

ivity

tt

Figure: 6.4 Business Cycle

Business CycleBusiness Cycle

Prosperity

Decline

Improvement

Depression

61

Figure: 6.5 Cyclical Components

Cyclical ComponentsCyclical Components

StartStart EndEnd

1.15 1.15

1.10 1.10

1.05 1.05

1.00 1.00

.95 .95

.90 .90

CCtt

tt|

11

|

22

|

33

|

44

|

55

|

66

|

77

|

8819971997 19991999 20012001 20032003

These are medium-term changes in results caused by circumstances which repeat in cycles. In business, cyclical

variations are commonly associated with economic cycles, successful booms and slumps in the economy.

Economic cycles may last a few years. Cyclical Variations are longer term than seasonal variations.

6.2.4 Irregular variation

Irregular variation is the fourth type of change in time-series analysis. In many situations, the value of a variations

describe such movements. The effects of the Middle East conflict in 1973, the Iraqi situation in 1990 on gasoline

prices in the United States are examples of irregular variation. Figure 1 (d) illustrates irregular variation.

Figure: 6.6 Irregular variation

Quantitative Applications in Management Research E-book

Documents

Transcript of Quantitative Applications in Management Research E-book