Quantitative Applications in Management Research E-book
-
Upload
sangeeta-chhetri -
Category
Documents
-
view
174 -
download
0
description
Transcript of Quantitative Applications in Management Research E-book
-
Quantitative Applications in Management and Research
Amity Directorate of Distance & Online Education
Decision-making is an essential and dominating part of the management process. Although authorities sometimes differ in their definitions of the basic functions of management, everybody agrees that one is not a manager unless he has some authority to plan, organise and control the activities of an enterprise and behaviour of the others. Within this context, decision-making may be viewed as the power to determine what plans will be made and how activities will be organized and controlled.
-
1
Preface
It gives me immense pleasure in bringing out the Students Study Material for the subject Quantitative Applications in Management and Research. The matter is represented in an easy way and covers particularly the need of the desired course. The purpose of the course is to help students acquire the mathematical skills which is required in the field of management, the material is such arranged so as to allow the progressive learning of Quantitative techniques.
-
2
Index
S.I. Nos. Chapter No. Subject Page No.
1 Chapter 1 Introduction to Quantitative Analysis 3-10
2 Chapter 2 Data Analysis 11-24
3 Chapter 3 Correlation Analysis 25-34
4 Chapter 4 Regression Analysis 35-41
5 Chapter 5 Probability & Probability distribution 42-55
6 Chapter 6 Time Series 55-69
7 Key to End Chapter quizzes 70-71
8 Bibliography 72
-
3
Chapter-I
Introduction to Quantitative Analysis
Contents:
1.1 Introduction
1.2 Decision - Making and Quantitative Techniques.
1.2.1 Elements of any decision are
1.3 Quantitative Applications in Management- an overview
1.4 Application of Quantitative methods in business & Management
1.4.1 Finance -Budgeting and Investments
1.4.2 Purchasing, Procurement and Exploration
1.4.3 Production Management
1.4.4 Marketing
1.4.5 Personal management
1.4.6 Research and Development
-
4
Chapter-I Introduction to Quantitative Analysis
1.1 Introduction
Decision-making is an essential and dominating part of the management process. Although
authorities sometimes differ in their definitions of the basic functions of management, everybody
agrees that one is not a manager unless he has some authority to plan, organise and control the
activities of an enterprise and behaviour of the others. Within this context, decision-making may
be viewed as the power to determine what plans will be made and how activities will be organized
and controlled. The right to make decisions is an internal part of right of authority upon which the
entire concept of management rests. Essentially then, decision-making pervades the activities of
every business manager. Further, since to carry out the key managerial functions of planning,
organizing, directing and controlling, the management is engaged in a continuous process of
decision-making pertaining to each of them, we can go to the extent of saying that management
may be regarded as equivalent to decision-making.
Traditionally, decision-making has been considered purely as an art, a talent which is
acquired over a period of time through experience. It has been considered so because a variety
of individual styles can be traced in handling and successfully solving similar type of managerial
problems in actual business. However, the environment in which the management has to operate
nowadays is complex and fast changing. There is a greater need for supplementing the art of
decision-making by systematic and scientific methods. A systematic approach to decision-making
is necessary because today's business and the environment in which it functions are far more
complex than in the past, and the cost of making errors is becoming graver with time. Most of the
business decisions cannot be made simply on the basic of rule of thumb, using commonsense
and / or snap judgment. Commonsense may be misleading and snap judgments may have
painful implications. For large business, a single wrong decision may not only one ruinous but
may also have ramifications in national or even international economies. As such, present day
management's cannot rely solely on a trial and error approach and the managers have to be
more sophisticated. They should employ scientific methods to help them make proper choices.
Thus, the decision makers, in the business world of today must understand scientific
methodology for making decisions.
1.2 Decision - Making and Quantitative Techniques
Managerial decision-making is a process by which management, when faced with a problem,
chooses a specific course of action from a set of possible options. In making a decision, a
business manager attempts to choose that course of action which is most effective in the given
circumstances in attaining the goals of the organization. The various types of decision-making
-
5
situations that a manager might encounter can be listed as follows.
1. Decisions under certainty where all facts are known fully and for sure or uncertainly where the
event that would actually occur is not known but probabilities can be assigned to various possible
occurrences.
2. Decisions for one time-period only called static decisions, or a sequence of interrelated
decisions made either simultaneously or over several time periods called dynamic decisions.
3. Decisions where the opponent is nature (digging an oil well, for example) or a national
opponent (for instances, setting the advertising strategy when the actions of competitors have to
be considered)
These classes of decisions-making situations are not mutually exclusive and a given situation
would exhibit characteristics from each class. Stocking of an item for sale in a certain trade fair, for
instance, illustrates a static decision making situation where uncertainly exists and nature is the
opponent.
1.2.1 Elements of any decision are:
i. a decision-maker who could be an individual, group, organization, or society;
ii. a set of possible actions that may be taken to solve the decision problem;
iii. a set of possible states that might occur;
iv. a set of consequences (pay-offs) associated with various combinations of courses of action and
the states that may occur; and
v. the relationship between the pay-offs and the values of the decision maker;
In an actual decision-making situation, definition and identification of the alternatives, the states
and the consequences are most difficult, albeit not the most crucial, aspects of the decision
problem.
In real life, some decision-making situations are simple while others are not. Complexities in
decision situations arise due to several factors. These include the complicated manner of interaction of the
economic, political, technological, environmental and competitive forces in society, the limited resources of
an organization; the values, risk attitudes and knowledge of the decision-makers and the like. For example, a
company's decision to introduce a new product will be influenced by such considerations as market
conditions, labour rates and availability, and investment requirements and availability of funds. The decision
will be of multidimensional response, including the production methodology, cost and quality of the product,
price, package design, and marketing and advertising strategy. The results of the decision would conceivably
-
6
affect every segment of the organisation. The essential idea of the quantitative approach to decision-making
is that if the factors that influence the decisions can be identified and quantified then it becomes easier to
resolve the complexity of the decision-making situations. Thus, in dealing with complex problems, we may
use the tools of quantitative analysis. In fact, a large number of business problems have been given a
quantitative representation with varying degrees of success and it has led to a general approach which is
variably designated as operations research (for operational research), management science, systems
analysis, decision analysis, decision science, etc. Quantitative analysis is now extended to several areas of
business operations and represents probably the most effective approach to handling of some types of
decision problems.
A significant benefit of attaining some degree of proficiency with quantitative methods is exhibited in the way
the problems are perceived and formulated. A problem has to be well defined before it can be formulated
into a well-structured framework for solution. This requires an orderly and organised way of thinking.
Two observations may be made here. First, it should be understood clearly that a decision by itself
does not become a good and right decision for adoption merely because it is made within an orderly and
mathematically precise framework. Quantification at best is an aid to business judgment and not its
substitute. A certain degree of constructive skepticism is as desirable in considering a quantitative analysis
of business decisions as it is in any other process of decision-making. Further, some allowances should be
made for qualitative factors involving morale, motivation, leadership, etc. which cannot be ignored. But they
should not be allowed to dominate to such an extent that the quantitative analysis may look to be an
interesting academic exercise, but worthless. In fact, the manager should seek some balance between
quantitative and qualitative factors. Should, it may be noted that the various names for quantitative analysis;
operations research, management science, etc. cannot more or less the same general approach. We shall
not attempt to discuss the differences among the various labels as it is prone to create more heat than light,
but only state that the basic reason for so many titles is that the field is relatively new and there is not
consensus regarding which field of knowledge it includes.
1.3 Quantitative Applications in Management- an overview
The objective of quantitative research is to develop and employ mathematical models, theories and/or
hypotheses pertaining to natural phenomena. The process of measurement is central to quantitative
research because it provides the fundamental connection between empirical observation and mathematical
expression of quantitative relationships.
Quantitative research is generally approached using scientific methods, which include:
i. The generation of models, theories and hypotheses
ii. The development of instruments and methods for measurement
-
7
iii. Experimental control and manipulation of variables
iv. Collection of empirical data
v. Modeling and analysis of data
vi. Evaluation of results
Quantitative methods are research techniques that are used to gather quantitative data - information
dealing with numbers and anything that is measurable. Statistics, tables and graphs, are often used to
present the results of these methods.
1.4 Application of Quantitative methods in business & Management
The tools and techniques of Quantitative Techniques used in areas of management decision making
can be outlined as follows:
1.4.1 Finance -Budgeting and Investments
i. Cash-flow analysis, long range capital requirement, dividend policies, investments portfolios.
ii. Credit policies, credit risks and delinquent account procedures.
iii. Claim and complaint procedures.
1.4.2 Purchasing, Procurement and Exploration
i. Rules for buying, supplies and stable or varying prices.
ii. Determination of quantities and timing of purchases.
iii. Bidding policies.
iv. Strategies for exploration and exploitation of raw material sources.
v. Replacements policies.
1.4.3 Production Management
i. Physical distribution
a) Location and size of warehouses, distribution centers and retail outlets.
b) Distribution policy.
ii. Facilities Planning a) Numbers and location of factories, warehouses, hospitals, etc.
b) Loading and unloading facilities for railroads and trucks determining the transport schedule.
iii. Manufacturing a) Production, scheduling and sequencing.
b) Stabilisation of production and employment training, layoffs and optimum product mix.
iv. Maintenance and Project scheduling a) Maintenance policies and preventive maintenance.
b) Maintenance crew sizes.
c) Project scheduling and allocation of resources.
-
8
1.4.4 Marketing
i. Product selection, timing, competitive actions.
ii. Number of salesman, frequency of calling on accounts per cent of time spent on prospects.
iii. Advertising media with respect to cost and time.
1.4.5 Personal management
i. Selection of suitable personnel on minimum salary.
ii. Mixes of age and skills.
iii. Recruitment policies and assignment of jobs.
1.4.6 Research and Development
i. Determination of the areas of concentration of research and development.
ii. Project selection.
iii. Determination of time cost trade-off and control of development projects.
iv. Reliability and alternative design.
-
9
Chapter-I Introduction to Quantitative Analysis
End Chapter quizzes : I
Ques 1. Traditionally, decision-making has been considered purely as an
a. Art b. Science c. Social Science d. Mathematics
Ques 2. Managerial decision-making is a process by which management, chooses a specific course of action from a set of
a. Restricted options b. Possible options. c. No options d. None
Ques 3. Decisions for one time-period only called
a. dynamic decisions b. static decisions c. Both d. None
Ques 4. Decision Making can be done under
a. Certainty b. Uncertainty c. Both d. None
Ques 5. Decision-maker could be
a. an individual b. group c. society d. All the above
Ques 6. Quantitative research is generally approached using scientific methods, which include:
a. The generation of models, theories and hypotheses b. Experimental control and manipulation of variables c. Modeling and analysis of data d. All the above
-
10
Ques 7. Quantitative research provides the fundamental connection between
a. empirical observation and mathematical expression b. empirical observation and qualitative expression c. empirical observation and social expression d. empirical observation and all expression
Ques 8. Numbers and location of factories, warehouses, hospitals, etc comes under
a. Maintenance and Project scheduling b. Purchasing, Procurement and Exploration c. Facilities Planning d. Physical distribution Ques 9. Selection of suitable personnel on minimum salary
a. Production Management b. Personal management c. Research and Development d. Finance -Budgeting and Investments
Ques 10. Most of the business decisions can be made on the basic of
a. Rule of thumb b. Commonsense c. Snap judgment. d. Quantitative Techniques
-
11
Chapter-II
Data Analysis
Contents:
2.1 Introduction
2.1.1 Types of Data
2.2 Some Definitions
2.3 Frequency Distribution:
2.3.1 Graphical presentation of Frequency distribution
2.4 Measure of Central tendency
2.4.1 Arithmetic Mean
2.4.2 Median
2.4.3 Mode
2.5 Measure of Dispersion
2.5.1 Range
2.5.2 Mean Deviation
2.5.3 Variance and standard deviation
2.5.4 The Coefficient of Variation
-
12
Chapter-II Data Analysis
2.1 Introduction
Statistics is a branch of applied mathematics concerned with the collection and interpretation of quantitative data and
the use of probability theory to estimate population parametersStatistical methods can be used to summarize or
describe a collection of data; this is called descriptive statistics.
Data: A collection of values to be used for statistical analysis.
A dictionary defines data as facts or figures from which conclusions may be drawn. Data may consist of
numbers, words, or images, particularly as measurements or observations of a set of variables. Data are often
viewed as a lowest level of abstraction from which information and knowledge are derived. Thus, technically, it is a
collective or plural noun.
Datum is the singular form of the noun data. Data can be classified as either numeric or nonnumeric. Specific terms
are used as follows:
2.1.1 Types of Data
I.I Qualitative data are nonnumeric.
1. {Poor, Fair, Good, Better, Best}, colors (ignoring any physical causes), and types of material {straw, sticks, bricks} are examples of qualitative data.
2. Qualitative data are often termed categorical data. Some books use the terms individual and variable to
reference the objects and characteristics described by a set of data. They also stress the importance of exact
definitions of these variables, including what units they are recorded in. The reason the data were collected
is also important.
II Quantitative data are numeric.
Quantitative data are further classified as either discrete or continuous.
Discrete data are numeric data that have a finite number of possible values.
A classic example of discrete data is a finite subset of the counting numbers, {1,2,3,4,5} perhaps corresponding to {Strongly Disagree Strongly Agree}.
When data represent counts, they are discrete. An example might be how many students were absent on a given day. ac on a given day. Counts are usually considered exact and integer.
Continuous data have infinite possibilities: 1.4, 1.41, 1.414, 1.4142, 1.141421...
The real numbers are continuous with no gaps or interruptions. Physically measureable quantities of length,
volume, time, mass, etc. are generally considered continuous. At the physical level (microscopically), especially
-
13
for mass, this may not be true, but for normal life situations is a valid assumption.
Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful
information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and
approaches, encompassing diverse techniques under a variety of names, in different business, science, and social
science domains.
2.2 Some Definitions
Raw Data: Data collected in original form. Frequency: The number of times a certain value or class of values occurs. Frequency Distribution: The organization of raw data in table form with classes and frequencies. Categorical Frequency Distribution: A frequency distribution in which the data is only nominal or ordinal. Ungrouped Frequency Distribution: A frequency distribution of numerical data. The raw data is not grouped. Grouped Frequency Distribution: A frequency distribution where several numbers are grouped into one class. Class Limits: Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next. Class Boundaries: Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit. Class Width: The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class. Class Mark (Midpoint): The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two. Cumulative Frequency: The number of values less than the upper class boundary for the current class. This is a running total of the frequencies. Relative Frequency: The frequency divided by the total frequency. This gives the percent of values falling in that class. Cumulative Relative Frequency (Relative Cumulative Frequency): The running total of the relative frequencies or the cumulative frequency divided by the total frequency, gives the percent of the values which are less than the upper class boundary.
2.3 Frequency Distribution
The distribution of empirical data is called a frequency distribution and consists of a count of the number of
occurrences of each value. If the data are continuous, then a grouped frequency distribution is used. Typically, a
distribution is portrayed using a frequency polygon or a histogram. Mathematical distributions are often used to
define distributions. The normal distribution is, perhaps, the best known example. Many empirical distributions are
-
14
approximated well by mathematical distributions such as the normal distribution.
Grouped Frequency Distribution A grouped frequency distribution is a frequency distribution in which
frequencies are displayed for ranges of data rather than for individual values. For example, the distribution of heights
might be calculated by defining one-inch ranges. The frequency of individuals with various heights rounded off to the
nearest inch would be then be tabulated.
2.3.1 Graphical presentation of Frequency distribution:
Histogram
A histogram is a graphical display of tabulated frequencies. A histogram is the graphical version of a table that shows
what proportion of cases fall into each of several or many specified categories.
Figure 2.1: Histogram
Example of a histogram of 100 values
Advantages
Visually strong
Can compare to normal curve
Usually vertical axis is a frequency count of items falling into each category
Disadvantages
Cannot read exact values because data is grouped into categories
More difficult to compare two data sets
Use only with continuous data
Frequency Polygons
Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same
purpose as histograms, but are especially helpful in comparing sets of data. Frequency polygons are also a good
choice for displaying cumulative frequency distributions.
-
15
To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-axis
representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it
with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point
in the middle of each class interval at the height corresponding to its frequency. Finally, connect the points. You
should include one class interval below the lowest value in your data and one above the highest value. The graph will
then touch the X-axis on both sides.
Figure 2.2: Histogram/Frequency Polygons
Advantages
Visually appealing
Can compare to normal curve
Can compare two data sets
Disadvantages
Anchors at both ends may imply zero as data points
Use only with continuous data
Frequency Curve
A smooth curve which corresponds to the limiting case of a histogram computed for a frequency distribution
of a continuous distribution as the number of data points becomes very large.
-
16
Figure 2.3 : Histogram/Frequency Polygons/Frequency Curve
Advantages
Visually appealing
Disadvantages
Anchors at both ends may imply zero as data points
Use only with continuous data
2.4 Measure of Central tendency
Central Tendency is the center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median and mode. The center of a distribution could be defined three ways:
1. the point on which a distribution would balance, 2. the value whose average absolute deviation from all the other values is minimized, and
3. the value whose squared difference from all the other values is minimized.
From the simulation in this chapter, you discovered (we hope) that the mean is the point on which a distribution
would balance, the median is the value that minimizes the sum of absolute deviations, and the mean is the value that
minimizes the sum of the squared values.
2.4.1 Arithmetic Mean
The arithmetic mean is the most common measure of central tendency. For a data set, the mean is the sum of the
observations divided by the number of observations. Basically, the mean describes the central location of the data.
For a given set of data, where the observations are x1, x2,.,xi ; the Arithmetic Mean is defined as :
The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population
with different sample sizes:
-
17
Example 1:
Observations 12 15 20 22 30
Weights 2 5 7 6 1
Find the mean.
Observations Weights xiwi
Mean =401
/21 =19.10
12 2 24
15 5 75
20 7 140
22 6 132
30 1 30
Total 21 404
Advantages
can be specified using and equation, and therefore can be manipulated algebraically
is the most sufficient of the three estimators
is the most efficient of the three estimators
is unbiased
Disadvantages
is very sensitive to extreme scores (i.e., low resistance)
value is unlikely to be one of the actual data points
requires an interval scale
anything else about the distribution that wed want to convey to someone if we were describing it to them?
2.4.2 Median
The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest
value and picking the middle one. If there is an even number of observations, the median is not unique, so one often
takes the mean of the two middle values.
For Odd number of observations:
Median = (n+1)/2 th
observations.
-
18
For Even number of observations:
Median = Average of (n/2) th
and (n/2 + 1) th observations.
Here are the sample test scores you have seen so often:
100, 100, 99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45
The "middle" score of this group could easily be seen as 87. Why? Exactly half of the scores lie above 87 and half lie
below it. Thus, 87 is in the middle of this set of scores. This score is known as the median.
In this example, there are 21 scores. The eleventh score in the ordered set is the median score (87), because ten
scores are on either side of it.
If there were an even number of scores, say 20, the median would fall halfway between the tenth and eleventh
scores in the ordered set. We would find it by adding the two scores (the tenth and eleventh scores) together and
dividing by two.
Advantages
is unbiased
is unaffected by extreme scores (i.e., high resistance)
doesnt require the use of an interval scale, as long as you can order the scores along some continuum
then you can find the median
Disadvantage
can not be specified using an equation so cant be manipulated algebraically
is the least sufficient of the three estimators
is less efficient than the mean
2.4.3 Mode
The mode is the most frequently occurring value. It is the most common value in a distribution: The mode of 3, 4,
4, 5, 5, 5, 8 is 5. Note that the mode may be very different from the mean and the median.
With continuous data such as response time measured to many decimals, the frequency of each value is one
since no two scores will be exactly the same. Therefore the mode of continuous data is normally computed from
a grouped frequency distribution. The grouped frequency distribution table shows a grouped frequency
distribution for the target response time data. Since the interval with the highest frequency is 600-700, the mode
is the middle of that interval (650).
-
19
Table 2.1: Grouped frequency distribution
Range Frequency
500-600 3
600-700 6
700-800 5
800-900 5
900-1000 0
1000-1100 1
Advantages
represents a number that actually occurred in the data
represents the largest number of scores, and so the probability of getting that score is greater then the
probability of getting any of the other scores if an observation is just chosen at random is unaffected by
extreme scores (i.e., high resistance)
is unbiased
doesnt require an interval scale
Disadvantages
the mode depends on how we group the data
can not be specified using an equation so cant be manipulated algebraically
is less sufficient than the mean
is less efficient than the mean
2.5 Measure of Dispersion
Measures of Dispersion provide us with a summary of how much the points in our data set vary, e.g. how spread out
they are or how volatile they are.
In measuring dispersion, it is necessary to know the amount of variation and the degree of variation. The
former is designated as absolute measures if dispersion and expressed in the denomination of original variants while
the latter is designated as related measures of dispersion.
Absolute measures can be divided into positional measures based on some items of the series such as (I)
Range, (ii) Quartile deviation or semi interquartile range and those which are based on all items in series such as
(I) Mean deviation, (ii) Standard deviation. The relative measures in each of the above cases are called the
-
20
coefficients of the respective measures. For purposes of comparison between two or more series with varying size or
number of items, varying central values or units of calculation, only relatives measures can be used.
The following are the important methods of studying variation:
1. Range
2. Mean deviation
3. Standard deviation and Variance (which is closely related to standard deviation)
4. The Coefficient of Variation
2.5.1 Range
Range is the simplest of the summary measures of variation .It is also the crudest and most prone to error .It is
computed as the difference between the largest and the smallest value in a data set:
Range = H- L
Absolute range H - L Relative range; Coefficient of range = =
Sum of the two extremes H + L
For example, for the data set {2, 2, 3, 4, 14}
Range = 14-2=12
14 2 12 Coefficient of range = = = 0.75 14 + 2 16
Example:
You are given the following data:
3 6 9 11
Compute the sample range
Solution:
H = 11, L = 3
range = H - L = 11 - 3 = 8
2.5.2 Mean Deviation
Mean Deviation can be calculated from any value of Central Tendency, viz. Mean, Median, Mode. Accordingly, Mean
Deviation can be of the following types:
Mean Deviation about Mean
-
21
Mean Deviation about Median
Mean Deviation about Mode
Mean Deviation about Mean =
Properties of Mean Deviation about Mean:-
The average absolute deviation from the mean is less than or equal to the Standard Deviation.
The mean deviation of any data set from its mean is always zero.
The mean absolute deviation is the average absolute deviation from the mean and is a common measure of
Forecast Error or Time Series Analysis.
For example, for the data set {2, 2, 3, 4, 14}:
Measure of central tendency Absolute deviation
Mean = 5
| 2 - 5| + | 2 - 5| +| 3 - 5| + | 4 - 5| + | 14 - 5| = 3.6
5
2.5.3 Variance and standard deviation
Variance and standard deviation are the most common of all of the measures of variation
Variance is a measure of statistical dispersion, indicating how its possible values are spread around the mean. Thus,
variance indicates the variability of the values. A smaller value implies a smaller variation from the mean
The positive square root of Variance is called the Standard Deviation.
Let us consider an example:
Values Xi - Mean(x) [Xi - XMean]2
4 -1 1
-
22
6 1 1
5 0 0
5 0 0
Total =20 , mean=5 2
Variance = .2 =1/2
S.D =
2.5.4 The Coefficient of Variation
The Coefficient of Variance is a measure of variation expressed as a percentage the sample mean:
CV = S . 100 Xmean
-
23
Chapter-II Data Analysis
End Chapter quizzes: II
Ques 1. Singular form of the data is
a. Datum b. Stratum c. Date d. Data
Ques 2. Graphical presentation of Frequency distribution can be done by
a. Histogram b. Frequency polygons c. Frequency Curve d. All the three
Ques 3. Which one is unaffected by extreme scores
a. Mean b. Median c. Mode d. Range
Ques 4.Which one is not the Measure of Dispersion
a. Range b. Mean deviation c. Histogram d. Standard deviation
Ques 5.Chaya took 7 math tests in one marking period. What is the range of her test scores?
89, 73, 84, 91, 87, 77, 94
a. 25 b. 21 c. 13 d. 15
Ques 6.In a crash test, 11 cars were tested to determine what impact speed was required to obtain minimal bumper damage. Find the mode of the speeds given in miles per hour below.
24, 15, 18, 20, 18, 22, 20, 26, 18, 26, 24
a. 18 b. 20 c. 18.6 d. 15
-
24
Ques 7. A survey conducted by an automobile company showed the number of cars per household and the corresponding probabilities. Find the standard deviation.
Number of cars X 1 2 3 4
Probability P(X) 0.32 0.51 0.12 0.05
a. 4.24 b. 0.63 c. 0.79 d. 1.9
Ques 8. The given data shows the number of burgers sold at a bakery in the last 14 weeks. 17, 13, 18, 17, 13, 16, 18, 19, 17, 13, 16, 18, 20, 19 Find the median number of burgers sold.
a. 18.5 b. 17 c. 18 d. 17.5
Ques 9.Histograms can be constructed for
a. Discrete data b. Continuous data c. Both d. none
Ques 10.Which is called positional average
a. Mean b. Median c. Mode d. None
-
25
Chapter-III
Correlation Analysis
Contents:
3.1 Introduction
3.2 Types of Correlation
3.2.1 Positive and Negative
3.2.2 Simple, partial and multiple
3.2.3 Linear and non-linear
3.3 Degrees of Correlation
3.3.1 Perfect correlation
3.3.2 Limited degrees of correlation
3.3.3 Absence of correlation
3.4 Methods of Determining Correlation
3.4.1 Scatter Plot
3.4.2 Karl Pearsons coefficient of correlation
3.4.3 Spearmans Rank-correlation coefficient
-
26
Chapter-III Correlation Analysis
3.1 Introduction
Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For
example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't
perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter
one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of
people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of
the variation in peoples' weights is related to their heights.
Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect
there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater
understanding of your data.
3.2 Types of Correlation
I. Positive and Negative
II. Simple, partial and multiple
III. Linear and non-linear
3.2.1 Positive and Negative Correlation
Positive Correlation
If the higher scores on X are generally paired with the higher scores on Y, and the lower scores on X are
generally paired with the lower scores on Y, then the direction of the correlation between two variables is
positive.
Negative Correlation
If the higher scores on X are generally paired with the lower scores on Y, and the lower scores on X are
generally paired with the higher scores on Y, then the direction of the correlation between two variables is
negative.
Figure: 3.1 Positive, Negative and No Correlation
-
27
3.2.2 Simple, partial and multiple
The distinction between simple, partial and multiple Correlation is based upon the number of variables studied
Simple Correlation
Correlation between only two variables, e.g. Correlation between age and height, correlation between yield of
rice and amount of rainfall in a given area are examples of Simple Correlation
Multiple Correlation
When correlation between three or more variables are studied simultaneously, then it is called multiple
Correlation
Partial Correlation
In this we recognize more than two variables but consider only two variables to be influencing each other, the
effect of other influencing variable being kept constant. The correlation between the two variables keeping the
other variables constant is called partial correlation
1 X1-Yield of rice
2 X2-Amount of Rainfall
3 X3-Amount of fertilizers
4 X4-Type of soil
5 X5-Advanced technologies used.
Correlation analysis of X1, X2, X3, X4 and X5 is an example of Multiple Correlation whereas if we only
study the relation between X1 and X2 keeping other variables constant it would be an example of Partial
Correlation between yield of rice and amount of rainfall.
-
28
3.2.3 Linear and non-linear
The nature of the graph gives us the idea of the linear type of correlation between two variables. If the graph is in
a straight line, the correlation is called a "linear correlation" and if the graph is not in a straight line, the correlation
is non-linear or curvi-linear
3.3 Degrees of Correlation
3.3.1 Perfect correlation
If two variables changes in the same direction and in the same proportion, the correlation between the two is
perfect positive. According to Karl Pearson the coefficient of correlation in this case is +1. On the other hand if
the variables change in the opposite direction and in the same proportion, the correlation is perfect negative. its
coefficient of correlation is -1. In practice we rarely come across these types of correlations.
3.3.2 Limited degrees of correlation
If two variables are not perfectly correlated or is there a perfect absence of correlation, then we term the
correlation as Limited correlation. It may be positive, negative or zero but lies with the limits 1.
3.3.3 Absence of correlation
If two series of two variables exhibit no relations between them or change in variable does not lead to a change
in the other variable, then we can firmly say that there is no correlation or absurd correlation between the two
variables. In such a case the coefficient of correlation is 0.
Table: 3.1 Meaning of (r) in the Correlation Coefficient
Relationship Between X and Y
r = + 1.0 Strong - Positive As X goes up, Y always also goes up
r = + 0.5 Weak - Positive As X goes up, Y tends to usually also go
up
r = 0 - No Correlation - X and Y are not correlated
r = - 0.5 Weak - Negative As X goes up, Y tends to usually go down
r = - 1.0 Strong - Negative As X goes up, Y always goes down
3.4 Methods of Determining Correlation
1 Scatter Plot
2 Karl Pearsons coefficient of correlation
3 Spearmans Rank-correlation coefficient.
-
29
3.4.1 Scatter Plot (Scatter diagram or dot diagram)
In this method the values of the two variables are plotted on a graph paper. One is taken along the horizontal ((x-
axis) and the other along the vertical (y-axis). By plotting the data, we get points (dots) on the graph which are
generally scattered and hence the name Scatter Plot.
The manner in which these points are scattered, suggest the degree and the direction of correlation. The
degree of correlation is denoted by r and its direction is given by the signs positive and negative.
Figure: 3.2 Positive, Negative and No Correlation
positive correlation negative correlation no correlation
3.4.2 Karl Pearsons coefficient of correlation
It gives the numerical expression for the measure of correlation. It is noted by r . The value of r gives the
magnitude of correlation and sign denotes its direction. It is defined as
r = . Cov (x,y) .
(Var x .Var y)
Table: 3.2 Correlation coefficient between advertisement expenditure(X) and sales (Y)
X (Rs. akhs) Y (Rs. crore) (X- X Mean)2 (Y- Y Mean)
2 (X - X Mean) (Y -Y Mean)
4 16 0.1849 1.6641 0.5547
6 29 2.4649 137.124 18.4789
10 43 31.0249 661.004 143.2047
5 20 0.3249 5.7100 1.5447
1 3 11.7649 204.204 49.0147
2 4 5.9049 176.624 32.2947
3 6 2.0449 127.464 16.1447
X =31 Y =121 (X- X Mean)2
=53.7143
(Y- Y Mean)2
=1310.794
(X - X Mean) (Y -Y Mean)
=261.2371
-
30
X Mean = 4.43 and Y Mean = 17.29
Sum of squared deviations in advertisement expenditure = 53.71
Sum of squared deviations of sales = 1310.79
Sum of cross products (SP) = 261.24
Calculation of the Pearson r
r = 261.24 = 261.24 .
(53.71) (1310.79) 70402.53
r = (261.24) / (265.33) = +0.985
Interpretation
The magnitude of the correlation between advertisement expenditure and sales = 0.985. The direction of the relationship is positive. As the advertisement expenditure increases so does the sales of the commodity.
3.4.3 Spearmans Rank-correlation coefficient
The most precise way to compare several pairs of data is to use a statistical test - this establishes whether the
correlation is really significant or if it could have been the result of chance alone.
Spearmans Rank correlation coefficient is a technique which can be used to summarise the strength and
direction (negative or positive) of a relationship between two variables.
The result will always be between 1 and minus 1.
Method - calculating the coefficient
Create a table from your data.
Rank the two data sets. Ranking is achieved by giving the ranking '1' to the biggest number in a column, '2'
to the second biggest value and so on. The smallest value in the column will get the lowest ranking. This
should be done for both sets of measurements.
Tied scores are given the mean (average) rank. For example, the three tied scores of 1 euro in the example
below are ranked fifth in order of price, but occupy three positions (fifth, sixth and seventh) in a ranking
hierarchy of ten. The mean rank in this case is calculated as (5+6+7) 3 = 6.
Find the difference in the ranks (d): This is the difference between the ranks of the two values on each row
of the table. The rank of the second value (price) is subtracted from the rank of the first (distance from the
museum).
-
31
Square the differences (d) To remove negative values and then sum them (d2 ).
Table: 3.3 Spearman's Rank Correlation
Convenience
Store
Distance from
CAM (m)
Rank Price of 50cl
bottle ()
Rank Difference between
the ranks (d)
d2
1 50 10 1.80 2 8 64
2 175 9 1.20 3.5 5.5 30.25
3 270 8 2.00 1 7 49
4 375 7 1.00 6 1 1
5 425 6 1.00 6 0 0
6 580 5 1.20 3.5 1.5 2.25
7 710 4 0.80 9 -5 25
8 790 3 0.60 10 -7 49
9 890 2 1.00 6 -4 16
10 980 1 0.85 8 -7 49
d =
285.5
Calculate the coefficient (R) using the formula below. The answer will always be between 1.0 (a perfect
positive correlation) and -1.0 (a perfect negative correlation).
When written in mathematical notation the Spearman Rank formula looks like this :
Now to put all these values into the formula.
Find the value of all the d2
values by adding up all the values in the Difference2 column. In our example this is 285.5.
Multiplying this by 6 gives 1713.
Now for the bottom line of the equation. The value n is the number of sites at which you took measurements. This, in our example is 10. Substituting these values into n - n we get 1000 - 10
We now have the formula: R = 1 - (1713/990) which gives a value for R: 1 - 1.73 = 0 -0.73.
-
32
What does this R value of -0.73 mean
The R value of -0.73 suggests a fairly strong negative relationship.
-
33
Chapter-III Correlation
End Chapter quizzes: III
Ques.1. If the higher scores on X are paired with the lower scores on Y then the correlation between two variables is
a. Positive b. Negative. c. No correlation d. Unknown
Ques.2. The value of r gives the magnitude of correlation and sign denotes its
a. Value b. Direction c. Both d. None
Ques.3. When correlation between three or more variables are studied simultaneously, then it is called
a. Simple Correlation b. Partial Correlation c. multiple Correlation d. All of the above
Ques.4. If the graph between two variables gives a straight line, the correlation is called a
a. linear correlation b. Curvi linear correlation c. Absence of correlation d. Simple correlation
Ques.5. If two variables changes in the same direction and in the same proportion, the correlation between the two is
a. Perfect negative b. Perfect positive c. Limited positive d. Limited Negative
Ques.6.The correlation coefficient, r = 0, implies
a. Perfect negative b. Perfect positive c. No correlation d. Limited correlation
Ques.7 Which of the following is a stronger correlation than -.54? a. 0 b. -.45 c. .45 d. -.67
-
34
Ques.8 If the correlation between body weight and annual income were high and positive, we could conclude that:
(a) High incomes cause people to eat more food. (b) Low incomes cause people to eat less food. (c) High income people tend to spend a greater proportion of their income on food than low income people,
on average. (d) High income people tend to be heavier than low income people, on average.
Ques.9 Men tend to marry women who are slightly younger than themselves. Suppose that every man married a woman who was exactly .5 of a year younger than themselves. Which of the following is CORRECT? (a) The correlation is -.5. (b) The correlation is .5. (c) The correlation is 1. (d) The correlation is -1.
Ques.10. National consumer magazine reported the following correlations. The correlation between car weight and car reliability is -0.30. The correlation between car weight and annual maintenance cost is 0.20.
Which of the following statements are true? I. Heavier cars tend to be less reliable. II. Heavier cars tend to cost more to maintain. III. Car weight is related more strongly to reliability than to maintenance cost.
a. I only b. II only c. III only d. I, II, and III
-
35
Chapter-IV Regression Analysis
Contents: 4.1 Introduction 4.2 Regression Equations 4.3 How to Find the Regression Equation 4.4 Properties of the Regression coefficients 4.5 Difference between Correlation and Regression
-
36
Chapter-IV Regression Analysis
4.1 Introduction
Regression analysis is a technique used for the modeling and analysis of numerical data consisting of values of a
dependent variable (response variable) and of one or more independent variable (explanatory variables). The
dependent variable in the regression equation is modeled as a function of the independent variables,
corresponding parameters ("constants"), and an error term. The error term is treated as a random variable. It
represents unexplained variation in the dependent variable. The parameters are estimated so as to give a "best
fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have
also been used.
There are two types of variables in Regression Analysis.
1 Dependent variable
2 Independent variable
Dependent variable is also known as regressed or predicted or explained variable .Independent variable is also
known as regressor or predictor or explainer
Simple regression is used to examine the relationship between one dependent and one independent variable. After
performing an analysis, the regression statistics can be used to predict the dependent variable when the independent
variable is known. Regression goes beyond correlation by adding prediction capabilities.
The regression line (known as the least squares line) is a plot of the expected value of the dependent variable for
all values of the independent variable. Technically, it is the line that "minimizes the squared residuals". The
regression line is the one that best fits the data on a scatterplot.
In the regression equation, if y is the dependent variable and x is the independent variable. Here are three
equivalent ways to mathematically describe a linear regression model.
1 y = intercept + (slope x) + error
2 y = constant + (coefficient x) + error
3 y = a + b x + e
The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. It is expressed in
the units of the Y-axis divided by the units of the X-axis. If the slope is positive, Y increases as X increases. If the
slope is negative, Y decreases as X increases.
-
37
Figure: 4.1 Regression line
The Y intercept is the Y value of the line when X equals zero. It defines the elevation of the line.
For two variables X and Y, we will have two regression lines and they show mutual relationship between two
variables. The regression line of Y on X gives the most probable estimate of the values of Y for given values of X
whereas regression line of X on Y gives the most probable estimate of the values of X for given values of Y. Only one
regression line: In case of perfect correlation (r = +1), both the line of regression coincide and we get only one line.
4.2 Regression Equations
Regression Equations are algebraic expressions of the regression lines.
Regression Equation of Y on X
Y=a +b X
According to the principle of least squares, the normal equations for estimating a and b are
Y = Na + b X
XY =a X +b X2
Regression Equation of X on Y
X=a +b Y
According to the principle of least squares, the normal equations for estimating a and b are
X = Na + b Y
XY =a Y +b Y2
Regression Equation from Deviations taken from Arithmetic means of X and Y
Y-YMean =b yx (X-XMean)
byx is the regression coefficient of Y on X
byx = xy .
x2
-
38
4.3 How to Find the Regression Equation
Five randomly selected students took a math aptitude test before they began their statistics course. The Statistics
Department has three questions.
i. What linear regression equation best predicts statistics performance, based on math aptitude scores?
ii. If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?
iii. How well does the regression equation fit the data?
In the table below, the xi column shows scores on the aptitude test. Similarly, the yi column shows statistics grades.
The last two rows show sums and mean scores that we will use to conduct the regression analysis.
Table: 4.1.
Student xi
yi (xi - x) (yi - y) (xi - x)2 (yi - y)
2 (xi - x)(yi - y)
1 95 85 17 8 289 64 136
2 85
95 7 18 49 324 126
3 80 70 2 -7 4 49 -14
4 70 65 -8 -12 64 144 96
5 60 70 -18 -7 324 49 126
Mean 390 385 730 630 470
The regression equation is a linear equation of the form:
y-ymean =b yx (x-xmean)
byx is the regression coefficient of y on x
byx = xy = 470 = 0.643836
x2
730
y - 77 = 0 .643836 (x - 78)
y = .643836 x + 26.78082
Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x), perform
the computation, and you have an estimated value (y) for the dependent variable.
-
39
In our example, the independent variable is the student's score on the aptitude test. The dependent variable
is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade would be:
y = 0.643836 x + 26.78082 =0.643836 x 80 + 26.78082= 26.768 + 51.52 = 78.288
4.4 Properties of the Regression coefficients
1. Correlation Coefficient is the geometric mean between the regression coefficients.
r2 = b yx x b xy
2. If one of the regression coefficients is greater than unity, the other must be less than unity.
b yx 1 < 1
b xy
3. Both the regression coefficients will have the same sign.
4. The Correlation Coefficient will have the same sign as that of regression coefficients.
5. The arithmetic mean of the regression coefficients is greater than the Correlation Coefficient
4.5 Difference between Correlation and Regression
The difference between regression and correlation needs to be emphasised. Both methods attempt to describe the
association between two (or more) variables, and are often confused by students and professional scientists alike!
1 Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not
concerned with the relationship between variables; instead it gives an estimate as to the degree of association
between the variables. In fact, correlation analysis tests for interdependence of the variables.
2 As regression attempts to describe the dependence of a variable on one (or more) explanatory variables; it implicitly
assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless
of whether the path of effect is direct or indirect.
-
40
Chapter-IV Regression Analysis End Chapter quizzes: IV
Ques.1.In Regression Analysis the dependent variable is also known as
a. Regressed variable b. Regressor variable c. Random variable d. All of the above
Ques.2. Simple regression is used to examine the relationship between
a. two dependent variables b. two independent variables c. one dependent and one independent variable d. two dependent and one independent variable
Ques.3. In Regression Analysis, one regression line is obtained in case if
a. r = +1 b. r = -1 c. r = +1 d. r = 0
Ques.4. byx is the regression coefficient of Y on X
a. byx = xy x
2
b. byx = xy y
2
c. byx = y
2 .
x2
d. byx = x
2 .
xy
Ques.5. If one of the regression coefficients is greater than unity, the other must be
a. greater than unity b. less than unity c. equals to unity d. Not known
Ques.6. Both the regression coefficients will have
a. same sign b. opposite sign c. Not known d. None
Ques.7 If y is the dependent variable and x is the independent variable. Then the linear regression model will
be
-
41
a. x = a +b y + e b. y = b x c. x = b y d. y = a + b x + e
Ques.8. The arithmetic mean of the regression coefficients is ----------- then the correlation coefficient
a. Smaller b. Greater c. Equals to d. None
Ques.9 A regression equation was computed to be Y = 35 + 6X. The value of 35 indicates that:
a. An increase in one unit of X will result in an increase of 35 in Y b. The coefficient of correlation is 35 c. The coefficient of determination is 35 d. The regression line crosses the Y-axis at 35 Ques.10. After performing an analysis, the regression statistics can be used to predict the dependent variable when the ------------ variable is known
a. Independent b. dependent c. correlation coefficient d. All of the above
-
42
Chapter-V
Probability & Probability distribution
Contents:
5.1 Introduction
5.1.1 Definition of Probability:
5.1.2. Axioms of Probability
5.1.3. How to Compute Probability:
5.2 Addition Law of theorem
5.3 Multiplication Law of Probability
5.4 Probability Distribution
5.5 Binomial Distribution
5.5.1 Mean of Binomial Distribution
5.6. Poisson Distribution
5.6.1 Mean and variance of Poisson distribution
5.7. Normal Distribution or Normal Curve
5.7.1. Characteristics of Normal Distribution
5.7.2. Empirical Rule
.
-
43
Chapter-V Probability & Probability distribution
5.1 Introduction
Mathematically, the probability that an event will occur is expressed as a number between 0 and 1. Notationally, the
probability of event A is represented by P (A).
If P (A) equals zero, there is no chance that the event A will occur.
If P (A) is close to zero, there is little likelihood that event A will occur.
If P(A) is close to one, there is a strong chance that event A will occur
If P (A) equals one, event A will definitely occur.
The sum of all possible outcomes in a statistical experiment is equal to one. This means, for example, that if an
experiment can have three possible outcomes (A, B, and C), then
P (A) + P (B) + P(C) = 1.
5.1.1 Definition of Probability
Let an event A can happen in m ways, and fail in n ways where all ways are equally like are likely to occur, then the
probability of the happening of event A is defined as
-
44
From above, it may be noted P (A) = p is such that 0 P 1. P () = q is called the complementary event. Also 0 q 1
The sum of all possible outcomes in a statistical experiment is equal to one. This means, for example, that if an experiment can
have three possible outcomes (A, B, and C), then P (A) + P (B) + P(C) = 1.
Associated with each event A in S is the probability of A, P (A)
5.1.2. Axioms of Probability
Axioms:
1. P (A) 0
2. P(S) = 1 where S is the sample space
3. P (A U B) = P (A) + P (B) if A and B are mutually exclusive
e.g., P (ace or king) = P (ace) +P (king) =1/13+1/13=2/13.
Theorems about probability can be proved using these axioms and these theorems can be used in probability calculations. P (A) = 1- P () P (A U B) = P (A) + P (B) P (A B) (for mutually not exclusive events) E.g. P (ace or black) = P (ace) + P (black) P(ace and black)= 4/52 + 26/52 2/52 = 28/52 = 7/13
5.4 Some More Definitions:
Here we define and explain certain term which are used frequently.
(i) Trial and Event: Let an experiment be repeated under essentially the same conditions and let it result in any one of the several
possible outcomes. Then the experiment is called a trial and the possible outcomes are known as event or cases. In a throw of a
coin the turning of head or tail is called an event and the throwing of a coin is called a trial.
(ii) Exhaustive events: The total number of all possible outcomes in any trial in known as exhaustive events or exhaustive cases.
In a throw of a coin, the possible outcomes are head and tail i.e., these are two exhaustive cases. In the experiment of rolling a
die, the outcomes 1,2,3,4,5,6(six cases) are exhaustive.
(iii) Favourable events: The events, which entail the required happening, are said to be favourable events. For example in a throw
of die, to have the even number, 2, 4 and 6 are favourable events.
(iv) Mutually exclusive events: Two events are known as mutually exclusive when the occurrence of one of them, excludes the
occurrence of the other, e.g. while tossing a coin, we either get a head or tail but not both.
(v) Independent event: Two event may be independent, when the actual happening of one does not influence in any way the
happening of the other. In throwing two coins at a time, the outcome of one is independent of the sound. But in case a card is
drawn from a pack of well shuffled cards and is not replaced, then the second draw of the card is dependent on the first draw.
-
45
The second draw is then a dependent event.
(vi) Equally likely events: Two events are said to be equally likely if one of them can not be expected in preference, is called the
to other. For example in a throw of a coin two case i.e. head and tail are equally likely to come.
(vii) Conditional Probability: The probability of happening an event A, such that event B has happened, is called the conditional
probability of happening of A on the condition that B has already happened. It is usually denoted by P (A/B).
5.1.3. How to Compute Probability (Equally Likely Outcomes)
Sometimes, a statistical experiment can have n possible outcomes, each of which is equally likely. Suppose a subset of r
outcomes are classified as "successful" outcomes.
The probability that the experiment results in a successful outcome (S) is:
P(S) = (Number of successful outcomes) / (Total number of equally likely outcomes) = r / n
Consider the following experiment. An urn has 10 marbles. Two marbles are red, three are green, and five are blue. If an
experimenter randomly selects 1 marble from the urn, what is the probability that it will be green?
In this experiment, there are 10 equally likely outcomes, three of which are green marbles. Therefore, the probability of choosing
a green marble is 3/10 or 0.30.
The probability of an event refers to the likelihood that the event will occur
5.2. Addition Law of Probability
If P1, P2, P3, Pn be the probabilities of n mutually excusive events E1, E2, E3, En respectively, then the probability P
that one these events will happen, is given by
p = P1 + P2 + P3, + +Pn
p = P (E1 + E2 +E3, + +En) = P (E1) +P (E2) +P (E3) + +P (En)
5.3 Multiplication Law of Probability
If there are two independent events E1, and E2, the respective probability of which are known, then the probability that both will
happen simultaneously is the product of the probability of one and the conditional provisional probability of the other given that
the first that occurred.
P (AB) = P (A) x P (B).
Note:
(i) E1, and E2, independent events, then P (E2, / E1,) is the same as P (E2,). Then P (E1E2) = P (E1).P (E2). (ii) If P1, P2, P3, Pnbe the probabilities of independent even E1, E2, E3, En respectively then the probability p, that all events happen simultaneously is given by
P = P1.P2 P3 Pn
-
46
(iii) If P is the probability that an event will happen in one trial, then the probability that it will happen in a succession of r trials
is
= P.P.P..P = Pr
(iv) If P1, P2, P3, Pn be the probabilities that certain events E1, E2, E3, En happen, then the probability they
do not happen at all i.e., they all fail, is q1. q2. q3 qn = (1- p1). (1-p2). (1-pn) Hence the probability in which at least one of these events must happen is given by 1-q1, q2, q3, qn = 1 {( 1- p1). (1-p2). (1-pn)}
5.4 Probability Distribution
When a variable X takes the value x, with probability Pi( i = 1,2,3, ,n), then X in called random variable or stochastic
variable. The value x1, x2, x3, xn of the random variable X with their respective probabilities p1, p2, p3,
pn constitute a probability distribution of the variable X.
Mean Or Expected Value And Variance :Let a random variable X assumes the values x1, x2, x3, xn with respective
probabilities p1, p2, p3, pn, then the mean or expected value of X is defined as
E(X) = = p1x1 + p2x2 + p3x3 + . + pnxn = px.
The variance of the random variable X given by
The can be simplified to a more convenient from
5.5 Binomial Distribution
A random variable X which takes values 0, 1,2,..,n is said to follow a Binomial distribution
if its probability function in given by
P (X = r) = P (r) =cr prq
n-r, r = 0,1,2,,n,
Where p, q>0 such that p + q =1.
Let the probability of the happening of an event A in one trial be p and its probability of not
happening be 1 - p = q.
We assume that there are n trials and the happening of the event A is r times and its not
happening is n - r times.
-
47
This may shown as follows AAA
..
r times n-r times A indicates its happening, its and P (A) = P and P () = q We see that (1) has the probability
pp..p q.q..q = pr q
n-r
r times n-r times Clearly (1) is merely one order of arranging r As.
(1) .(2)
The probability of (1) = pr q
n-r x Number of different arrangements of
r As and (n-r) s. .
The number of different arrangements of r As and (n-r) s =ncr.
Probability of the happening of an event r times =ncr p
rq
n-r.
= p(r) qn-r
,
(r = 0, 1, 2,., n )
= (r + 1)th term of (q + p)
n.
If r = 0, probability of happening of an event 0 times =
nC0 q
n p
0 = q
n
If r = 1, probability of happening of an event 1 times =
nC1q
n1p
If r = 2, Probability of happening of an event 2 times =
If r = 3, probability of happening of an event 3 times =
nC2q
n2p
2
nC3q
n3p
3
and so on.
These terms are clearly the successive terms in the expansion of (q +p) n.
Hence it is called Binomial distributions.
Condition for the Applicability of Binomial Distribution:
While using the formula of the binomial distribution in solving any problem, the following conditions must be satisfied:
(a) There should be a finite number of trials.
(b) The trials do not depend on each other.
(c) Each trial should have only two possible outcomes, either a success or a failure.
(d) The probability of success of failure is the same for all the trials.
5.5.1 Mean of Binomial Distribution
If X is a binomial vitiate with parameters n and p, then
P (X = r) = p(r) =nCr p
rq
n-r, r = 0,1,2,..,n.
-
48
-
49
Example: The probability that a pen manufactured by a company will be defective is 1/10. If 12 such pens are manufactured fine
the probability that (1) exactly two will be defective, (ii) at least two will be defective (iii) none will be defective.
Solution: The probability of defective pen is 1/10=0.1
The probability of a non-defective pen is 1- 0.1= 0.9 Hence n = 12
(i) The probability that exactly two will be defective
= 12
C2 (0.1)2 (0.9)
10 = 0.2301.
(ii) The probability that exactly two will be defective
=1- (prob. That either none or one is non-defective)
=1- [12
C0 (0.9)12
+ 12
C1 (0.1) (0.9)11
] = 0.3412
(iii) The probability that none will be defective
= 12
C0 (0.9)12
= 0.2833.
Example: A die is thrown 8 times and it is required to find the probability that 3 will show (i) Exactly 2 times, (ii) At least seven
times, (iii) At least once.
Solution: The probability of throwing 3 in a single trial = P =1/6
The probability of not throwing 3 in a single trial = q = 5/6
a. P (getting 3, exactly 2 times)= 8C2 q
6p
2 =
(ii) P (getting 3 at least seven times) = P (getting 3, at 7 or 8 times)
= P (7) +P (8) = 8C7 q
1p
7 +
8C8q
0p
8
(iii) P (getting 3 at least once)
= P(getting 3, at 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 times)
=P(1) + P(2) + P(3) + P(4) + P(5) + P(6) + P(7) + P(8)
=1- P (getting 3, at 0 times) = 1- 8C0q
8p
0
=
5.6. Poisson Distribution
The Poisson distribution is generally used when measuring the number of occurrences of something (# of successes) over an
interval or time period.
The assumptions of a Poisson probability distribution are:
-
50
The probability of the occurrence of an event is constant for all subintervals.
There can be no more than one occurrence in each subinterval.
Occurrences are independent; that is, the number of occurrences in any non-overlapping intervals is independent of
one another.
The random variable X is said to follow the Poisson probability distribution if it has the probability function:
5.6.1 The mean and variance of the Poisson probability distribution are:
x = E(X) = and
x2
= E[(X -x)2 ] =
The Poisson probability distribution is an important discrete probability distribution for a number of applications, including:
1. The number of failures in a large computer system during a given day
2. The number of delivery trucks to arrive at a central warehouse in an hour
3. The number of customers to arrive for flights during each 15-minute time interval from 3:00 PM to 6:00 PM on weekdays
4. The number of customers to arrive at a checkout aisle in your local grocery store during a particular time interval
Example: On an average Friday, a waitress gets no tip from 5 customers. Find the probability that she will get no tip from 7
customers this Friday.
The waitress averages 5 customers that leave no tip on Fridays: = 5.
Random Variable: The number of customers that leave her no tip this Friday.
We are interested in .
So, the probability that 7 customers will leave no tip this Friday is 0.1044.
5.7. Normal Distribution or Normal Curve:
Normal distribution is probably one of the most important and widely used continuous distribution. It is known as a normal random
variable, and its probability distribution is called a normal distribution. The following are the characteristics of the normal
distribution:
-
51
5.7.1. Characteristics of the Normal Distribution:
1. It is bell shaped and is symmetrical about its mean.
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different normal distribution.
Thus, the normal distribution is completely described by two parameters: mean and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is 0.5.
6. It is unimodal, i.e., values mound up only in the center of the curve
A normal distribution in a variate with mean and variance is a statistic distribution with probability density function
on the domain
The Standard normal distribution is given by taking = 0 and 2 =
1 in a general normal distribution. An arbitrary normal
distribution can be converted to a Standard normal distribution by changing variables to , so , yielding
5.7.2. Empirical Rule
All normal density curves satisfy the following property which is often referred to as the Empirical Rule.
68% of the observations fall within 1 standard deviation of the mean, that is, between - and +.
95% of the observations fall within 2 standard deviations of the mean, that is, between - 2 and +2.
99.7% of the observations fall within 3 standard deviations of the mean, that is, between - 3 and +3.
.
Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.
Figure: 5.1Normal Distribution or Normal Curve
-
52
Example
The total weight of 8 people chosen at random follows a normal distribution with a mean of 550kg and a standard deviation of
150kg.
Whats the probability that the total weight of 8 people exceeds 600kg?
First sketch a diagram.
Figure: 5.1 Normal area curve
The mean is 550kg and we are interested in the area that is greater than 600kg.
z = ( x - xmean ) /
Here x = 600kg,
xmean, the mean = 550kg
, the standard deviation = 150kg
z = ( 600 - 550 ) / 150
z = 50 / 150
z = 0.33
Table: 5.1
-
53
Look in the table down the left hand column for z = 0.3,
and across under 0.03.
The number in the table is the tail area for z=0.33 which is 0.3707 .
This is the probability that the weight will exceed 600kg.
Our answer is
"The probability that the total weight of 8 people exceeds 600kg is 0.37 correct to 2
figures."
-
54
Chapter-V Probability & Probability distribution
End Chapter quizzes : V
Ques.1.A coin is tossed three times. What is the probability that it lands on heads exactly one time?
a. 0.125 b. 0.250 c. 0.333 d. 0.375
Ques.2.P(A U B) is the probability that __________ will occur
a. A b. B c. A and B d. A or B or both
Ques.3. The events in an experiment are _____________ if only one can occur at a time
a. mutually exclusive b. non-mutually exclusive c. mutually inclusive d. independent
. Ques.4. A die is rolled, find the probability that an even number is obtained.
a. 1/2 b. 1/3 c. 1/4 d. 1/5
Ques.5. Which of these numbers cannot be a probability?
a 0.00001 b 0.5 c 1.001 d 0
Ques.6. For the normal distribution, the mean plus and minus 1.96 standard deviations will include what
percent of the observations?
a. 80%
b. 84%
c. 90%
d. 95%
Ques.7. Normal distribution is a
a. Discrete distribution b. Continuous distribution c. Both d. None
-
55
Ques.8. Mean Of Binomial Distribution is given by
a. p b. np c. npq d. n
Ques.9. The probability of happening an event A, such that event B has happened, is called
a. disjoint probability b. independent probability c. conditional probability d. dependent probability
Ques.10. if A and B are mutually exclusive, then P (A U B) =
a. P (A) b. P (A) + P (B) c. P (B) d. P (A) + P (B) - P (A B)
-
56
Chapter-VI Time Series
Contents: 6.1 Introduction 6.1.1. Role of time Series
6.2. Components of a time series
6.2.1 Secular Trend 6.2.2 Seasonal variation 6.2.3 Cyclical variation 6.2.4 Irregular variation
6.3. Measurement of Trends
6.3.1 Freehand method 6.3.2 The method of semi-averages 6.3.3 The method of moving averages 6.3.4 The method of curve fitting by the Principle of Least Squares
6.4 Mathematical Models
6.4.1 Additive model 6.4.2 Multiplicative model 6.4.3 Mixed models
-
57
6.1 Introduction
Realization of the fact that "Time is Money" in business activities, the dynamic decision technologies presented here,
have been a necessary tool for applying to a wide range of managerial decisions successfully where time and money
are directly related. In making strategic decisions under uncertainty, we all make forecasts. We may not think that we
are forecasting, but our choices will be directed by our anticipation of results of our actions or inactions.
Indecision and delays are the parents of failure. This site is intended to help managers and administrators do a better
job of anticipating, and hence a better job of managing uncertainty, by using effective forecasting and other predictive
techniques.
A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken
at regular intervals (days, months, years), but the sampling could be irregular.
A time series analysis consists of two steps:
(1) building a model that represents a time series,
(2) using the model to predict (forecast) future values.
The time-series can be represented as a curve that evolves over time. Forecasting the time-series mean that we
extend the historical values into the future where the measurements are not available yet.
There are some subtleties in the definition a time-series forecast. For example, the historical data might be daily
sales and but you need monthly forecasts. Grouping the values according to a certain period (ex: month) is called
time-series.
The following are few examples of time series data:
1. Profits earned by a company for each of the past five years.
2. Workers employed by a company for each of the past 15 years.
3. Number of students registered for the MBA programme of an institute for each of the past five years.
4. The weekly wholesale price index for each of the past 30 weeks.
5. Number of fatal road accidents in Delhi for each day for the past two months.
6.1.1. Role of time Series
1. A time series analysis enables one to study such movements as cycles that fluctuate around the trend. Knowledge of cyclical pattern in certain series of data will be helpful in making generalisations in the concerned business or industry. 2. The analysis of a time series enables us to understand the past behavior or performance. We can know how the data have changed over time and find out the probable reasons responsible for such changes. If the past performance, say, of a company, has been poor, it can take corrective measures to arrest the poor performance.
-
58
3. A time series analysis helps directly in business planning. A firm can know the long-term trend in the sale of its products. It can find out at what rate sales have been increasing over the years. This may help it in making projections of its sales for the next few years and plan the procurement of raw material, equipment and manpower accordingly. 4. A time series analysis enables one to make meaningful comparisons in two or more series regarding the rate or type of growth. For example, growth in consumption at the national level can be compared with that in the national income over specified period. Such comparisons are of considerable importance to business and industry. 5. A time series analysis helps in evaluating current accomplishments. The actual performance can be compared with the expected performance and the cause of variation analysed e.g. if we know how much is the effect of seasonality on business we may device ways and means of ironing out the seasonal influence or decreasing it by producing commodities with complementary seasons.
6.2. Components of a time series
1 Secular Trend - the smooth long term direction of a time series
2 Seasonal Variation - Patterns of change in a time series within a year which tends to repeat each year
3 Cyclical Variation - the rise and fall of a time series over periods longer than one year
4 Irregular Variation - classified into:
Episodic - unpredictable but identifiable
Residual - also called chance fluctuation and unidentifiable
6.2.1 Secular Trend
With the first type of change, secular trend, the value of the variable tends to increase or decrease over a long period
of time. The steady increase in the cost of living recorded by the Consumer Price Index is an example of secular
trend. From year to individual year, the cost of living varies a great deal, but if we examine a long- term period, we
see that the trend is toward a steady increase. Figure shows a secular trend in an increasing but fluctuating time
series.
Figure: 6.1 Secular trend
-
59
6.2.2 Seasonal variation
The third kind of change in time-series data is seasonal variation. As we might expect from the name, seasonal
variation involves patterns of change within a year that tend to be repeated from year to year. For example, a
physician can expect a substantial increase in the number of flu cases every winter and of poison in every summer.
Since these are regular patterns, they are useful in forecasting the future. In figure 1(c), we see a seasonal variation.
Notice how it peaks in the fourth quarter of each year.
1 Sales of ice cream will be higher in summer than in winter, and sales of overcoats will be higher in autumn
than in spring.
2 Shops might expect higher sales shortly before Christmas or in their winter and summer sales.
3 Sales might be higher on Friday and Saturday than on Monday.
4 The telephone network may be heavily used at a certain times of the day (such as mid-morning and mid-
afternoon) and much less used at other times (such as in the middle of the night)
Figure: 6.2 Seasonal variation
Seasonal VariationSeasonal Variation
Linear trendLinear trend
4 4
3 3
2 2
1 1
Sa
les
of
Wil
dca
t sa
ilb
oa
tsS
ale
s o
f W
ild
ca
t sa
ilb
oa
ts
(mil
lio
ns
of
do
lla
rs)
(mil
lio
ns
of
do
lla
rs)
|
JulyJuly
20012001
|
JulyJuly
20022002
|
JulyJuly
20032003
|
JulyJuly
20042004
tt
6.2.3 Cyclical variation
The second type of variation seen in a time series in cyclical fluctuation. The most common example of cyclical
fluctuation is the business cycle. Over time, there are years when the business cycle hits a peak above the trend line.
At other times, business activity is likely to slump, hitting a low point below the trend line. The time between bitting
peaks or falling to low points is a least 1 year, and it can be as many as 15 or 20 years. Figure 1(b) illustrates a
typical pattern of cyclical fluctuation above and below a secular trend line. Note that the cyclical movements do not
-
60
follow any regular pattern but move in a somewhat unpredictable manner.
Figure: 6.3 cyclical variation
Cyclical VariationCyclical Variation
Z1Z1-- DeclineDecline
P1P1-- ProsperityProsperity
V1V1-- DepressionDepression
Z2Z2-- ImprovementImprovement
ZZ11PP11
VV11
ZZ22PP22
VV22
Cyc
lical
act
ivity
Cyc
lical
act
ivity
tt
Figure: 6.4 Business Cycle
Business CycleBusiness Cycle
Prosperity
Decline
Improvement
Depression
-
61
Figure: 6.5 Cyclical Components
Cyclical ComponentsCyclical Components
StartStart EndEnd
1.15 1.15
1.10 1.10
1.05 1.05
1.00 1.00
.95 .95
.90 .90
CCtt
tt|
11
|
22
|
33
|
44
|
55
|
66
|
77
|
8819971997 19991999 20012001 20032003
These are medium-term changes in results caused by circumstances which repeat in cycles. In business, cyclical
variations are commonly associated with economic cycles, successful booms and slumps in the economy.
Economic cycles may last a few years. Cyclical Variations are longer term than seasonal variations.
6.2.4 Irregular variation
Irregular variation is the fourth type of change in time-series analysis. In many situations, the value of a variations
describe such movements. The effects of the Middle East conflict in 1973, the Iraqi situation in 1990 on gasoline
prices in the United States are examples of irregular variation. Figure 1 (d) illustrates irregular variation.
Figure: 6.6 Irregular variation