Quantitative Methods - EBS Student Services · Synopsis Quantitative Methods 1. Introducing...

21
Synopsis Quantitative Methods 1. Introducing Statistics: Some Simple Uses and Misuses Learning Objectives This module gives an overview of statistics, introducing basic ideas and concepts at a general level, before dealing with them in greater detail in later modules. The purpose is to provide a gentle way into the subject for those without a statistical background, in response to the cynical view that it is not possible for anyone to read a statistical text unless they have read it before. For those with a statistical background the module will provide a broad framework for studying the subject. Sections 1.1 Introduction 1.2 Probability 1.3 Discrete Statistical Distributions 1.4 Continuous Statistical Distributions 1.5 Standard Distributions 1.6 Wrong Use of Statistics 1.7 How to Spot Statistical Errors Learning Summary The purpose of this introduction has been twofold. The first aim has been to present some statistical concepts as a basis for more detailed study of the subject. All the concepts will be further explored later. The second aim has been to encourage a healthy scepticism and atmosphere of constructive criticism which are necessary when weighing statistical evidence. The healthy scepticism can be brought to bear on applications of the concepts introduced so far as much as elsewhere in statistics. Probability and distributions can both be subject to misuse. Logical errors are often made with probability. For example, suppose a questionnaire about marketing methods is sent to a selection of companies. From the 200 replies, it emerges that 48 of the respondents are not in the area of marketing. It also emerges that 30 are at junior levels within their companies. What is the probability that any particular questionnaire was filled in by someone neither in marketing nor at a senior level? It is tempting to suppose that:

Transcript of Quantitative Methods - EBS Student Services · Synopsis Quantitative Methods 1. Introducing...

Synopsis

Quantitative Methods

1. Introducing Statistics: Some Simple Uses and Misuses

Learning Objectives

This module gives an overview of statistics, introducing basic ideas and concepts at a general

level, before dealing with them in greater detail in later modules. The purpose is to provide a

gentle way into the subject for those without a statistical background, in response to the

cynical view that it is not possible for anyone to read a statistical text unless they have read it

before. For those with a statistical background the module will provide a broad framework

for studying the subject.

Sections

1.1 Introduction

1.2 Probability

1.3 Discrete Statistical Distributions

1.4 Continuous Statistical Distributions

1.5 Standard Distributions

1.6 Wrong Use of Statistics

1.7 How to Spot Statistical Errors

Learning Summary

The purpose of this introduction has been twofold. The first aim has been to present some

statistical concepts as a basis for more detailed study of the subject. All the concepts will be

further explored later. The second aim has been to encourage a healthy scepticism and

atmosphere of constructive criticism which are necessary when weighing statistical evidence.

The healthy scepticism can be brought to bear on applications of the concepts introduced

so far as much as elsewhere in statistics. Probability and distributions can both be subject to

misuse.

Logical errors are often made with probability. For example, suppose a questionnaire

about marketing methods is sent to a selection of companies. From the 200 replies, it

emerges that 48 of the respondents are not in the area of marketing. It also emerges that 30

are at junior levels within their companies. What is the probability that any particular

questionnaire was filled in by someone neither in marketing nor at a senior level? It is

tempting to suppose that:

This is almost certainly wrong because of double counting. Some of the 48 non-

marketers are also likely to be at a junior level. If 10 respondents were non-marketers and at a

junior level, then:

Only in the rare case where none of those at a junior level were outside the marketing

area would the first calculation have been correct.

Figure 1 Civil servants’ salaries

Graphical errors can frequently be seen with distributions. ‎Figure 1 shows an observed

distribution relating to the salaries of civil servants in a government department. The figures

give a wrong impression of the spread of salaries because the class intervals are not all equal.

One could be led to suppose that salaries are higher than they are. The lower bands are of

width £8000 (0–8, 8–16, 16–24). The higher ones are of much larger size. The distribution

should be drawn with all the intervals of equal size as in ‎Figure 2.

Statistical concepts are open to misuse and wrong interpretation just as verbal reports are.

The same vigilance should be exercised in the former as in the latter.

Figure 2 Civil servants’ salaries (amended)

2. Basic Mathematics: School Mathematics Applied to

Management

Learning Objectives

This module describes some basic mathematics and associated notation. Some management

applications are described but the main purpose of the module is to lay the mathematical

foundations for later modules. It will be preferable to encounter the shock of the mathemat-

ics at this stage rather than later when it might detract from the management concepts under

consideration. For the mathematically literate the module will serve as a review; for those in

a rush it could be omitted altogether.

Sections

2.1 Introduction

2.2 Graphical Representation

2.3 Manipulation of Equations

2.4 Linear Functions

2.5 Simultaneous Equations

2.6 Exponential Functions

3. Data Communication

Learning Objectives

By the end of the module the reader should know how to improve data presentation. This is

important both in communicating data to others and in analysing data. The emphasis is on

the visual aspects of data presentation. Special reference is made to accounting data and

graphs.

Sections

3.1 Introduction

3.2 Rules for Data Presentation

3.3 The Special Case of Accounting Data

3.4 Communicating Data through Graphs

Learning Summary

The communication of data is an area that has been neglected, presumably because it is

technically simple and there is a tendency in quantitative areas (and perhaps elsewhere) to

believe that only the complex can be useful. Yet in modern organisations there can be few

things more in need of improvement than data communication.

Although the area is technically simple, it does involve immense difficulties. What exactly

is the readership for a set of data? What is the purpose of the data? How can the common

insistence on data specified to a level of accuracy that is not needed by the decision maker

and is not merited by the collection methods be overcome? How much accounting conven-

tion should be retained in communicating financial information to the layman? What should

be done about the aspects of data presentation that are a matter of taste? The guiding

principle among the problems is that the data should be communicated according to the

needs of the receiver rather than the producer. Furthermore, they should be communicated

so that the main features can be seen quickly. The seven rules of data presentation described

in this module seek to accomplish this.

(a) Rule 1: round to two effective digits.

(b) Rule 2: reorder the numbers.

(c) Rule 3: interchange rows and columns.

(d) Rule 4: use summary measures.

(e) Rule 5: minimise use of space and lines.

(f) Rule 6: clarify labelling.

(g) Rule 7: use a verbal summary.

Producers of data are accustomed to presenting them in their own style. As always there

will be resistance to changing an attitude and presenting data in a different way. The idea of

rounding especially is usually not accepted instantly. Surprisingly, however, while objections

are raised against rounding, graphs tend to be universally acclaimed even when not appropri-

ate. Yet the graphing of data is the grossest form of rounding. There is evidently a need for

clear and consistent thinking in regard to data communication.

This issue has been of increasing importance because of the growth in usage of all types

and sizes of computers and the development of large-scale management information

systems. The benefits of this technological revolution should be enormous but the potential

has yet to be realised. The quantities of data that circulate in many organisations are vast. It

is supposed that the data provide information which in turn leads to better decision making.

Sadly this is frequently not the case. The data circulate, not providing enlightenment, but

causing at best indifference and at worst tidal waves of confusion. Poor data communication

is a prime cause of this. It could be improved. Otherwise, one must question the wisdom of

the large expenditures many organisations make in providing untouched and bewildering

management data. One thing is clear. If information can be assimilated quickly, it will be

used; if not, it will be ignored.

4. Data Analysis

Learning Objectives

By the end of this module the reader should know how to analyse data systematically. The

methodology suggested is simple, relying very much on visual interpretation, but it is suitable

for most data analysis problems in management. It carries implications for the ways

information is produced and used.

Sections

4.1 Introduction

4.2 Management Problems in Data Analysis

4.3 Guidelines for Data Analysis

Learning Summary

Every manager sees the problem of handling numbers differently because each sees it mainly

in the (probably) narrow context with which he or she is familiar in his or her own work.

One manager sees numbers only in the financial area, another sees them only in production

management. The guidelines suggested here are intended to be generally applicable to the

analysis of business data in many different situations and with a range of different require-

ments. The key points are:

(a) Simple methods are preferable to complex ones.

(b) Visual inspection of well-arranged data can play a role in coming to understand them.

(c) Data analysis is like verbal analysis.

(d) The guidelines merely make explicit what comes naturally when dealing with words.

The need for better skills to turn data into real information in managerial situations is not

new. What has made the need so urgent in recent times is the exceedingly rapid development

of computers and associated management information systems. The ability to provide vast

amounts of data very quickly has grown enormously. It has far outstripped the ability of

management to make use of the data. The result has been that in many organisations

managers have been swamped with so-called information which in fact is no more than mere

numbers. The problem of general data analysis is no longer a small one that can be ignored.

When companies are spending large amounts of money on data provision, the question of

how to turn the data into information and use them in decision making is one that has to be

faced.

The inadequacy of the traditional subject of statistics to help in this area is now being

recognised. New skills and techniques are being developed. For example, Tukey has

developed an ‘alternative statistics’ called exploratory data analysis which is, in his view,

better able to deal with modern statistical problems. Time will reveal its value. Work such as

Tukey’s is an indication of the new circumstances that are unfolding within organisations

with regard to quantitative matters. The help with the problem given in this module is at a

much lower technical level than exploratory data analysis. The guidelines help non-statistical

managers make a start at understanding their own data in recognition of the fact that much

of the data analysis a typical manager has to do does not require a high level of technical

expertise.

5. Summary Measures

Learning Objectives

By the end of the module, the reader should know how large quantities of numbers can be

reduced to a few simple summary measures which are much easier to handle them than the

raw data. The most common measures are those of location and scatter. The special case of

summarising time series data with indices is also described.

Sections

5.1 Introduction

5.2 Usefulness of the Measures

5.3 Measures of Location

5.4 Measures of Scatter

5.5 Other Summary Measures

5.6 Dealing with Outliers

5.7 Indices

Learning Summary

In the process of analysing data, at some stage the analyst tries to form a model of the data

as suggested previously. ‘Pattern’ or ‘summary’ are close synonyms for ‘model’. The model

may be simple (all rows are approximately equal) or complex (the data are related via a

multiple regression model). Often specifying the model requires intuition and imagination.

At the very least, summary measures can provide a model based on specifying for the data

set:

(a) a measure of location;

(b) a measure of scatter;

(c) the shape of the distribution.

In the absence of other inspiration, these four attributes provide a useful model of a set

of numbers. If the data consist of two or more distinct sets (as, for example, a table), then

this basic model can be applied to each. This will give a means of comparison between the

rows or columns of the table or between one time period and another.

The first attribute (number of readings) is easily supplied. Measures of location and scat-

ter have already been discussed. The shape of the distribution can be found by drawing a

histogram and literally describing its shape (as with the symmetrical, U and reverse J

distributions seen earlier). A short verbal statement about the shape is often an important

factor in summarising or forming a model of a set of data.

Verbal statements have a more general role in summarising data. They should be short, no

more than one sentence, and only used when they can add to the summary. They are used in

two ways: first, they are used when the quantitative measures are inadequate; second, they

are used to point out important features in the data. For example, a table of a company’s

profits over several years might indicate that profits had doubled. Or a table of the last two

months’ car production figures might have a note stating that 1500 cars were lost because of

a strike.

It is important in using verbal summaries to distinguish between helpful statements

pointing out major features and unhelpful statements dealing with trivial exceptions and

details. A verbal summary should always contribute to the objective of adding to the ease

and speed with which the data can be handled.

6. Sampling Methods

Learning Objectives

By the end of this module the reader should know the main principles underlying sampling

methods. Most managers have to deal with sampling in some way. It may be directly in

commissioning a sampling survey, or it may be indirectly in making use of information based

on sampling. For both purposes it is necessary to know something of the techniques and,

more importantly, the factors critical to their success.

Sections

6.1 Introduction

6.2 Applications of Sampling

6.3 The Ideas behind Sampling

6.4 Random Sampling Methods

6.5 Judgement Sampling

6.6 The Accuracy of Samples

6.7 Typical Difficulties in Sampling

6.8 What Sample Size?

Learning Summary

It is most surprising that information collection should be so often done in apparent

ignorance of the concept of sampling. Needing information about invoices, one large

company investigated every single invoice issued and received over a three-month period, a

monumental task. A simple sampling exercise would have reduced the cost to around one

per cent of the actual cost with little or no loss of accuracy.

Even after it is decided to use sampling there is still, obviously, a need for careful plan-

ning. This should include a precise timetable of what and how things are to be done. The

crucial questions are: ‘What are the exact objectives of the study?’ and ‘Can the information

be provided from any other source?’ Without this careful planning it is possible to collect a

sample and then find the required measurements cannot be made. For example, having

obtained a sample of 2000 of the workforce, it may be found that absence records do not

exist, or it may be found that another group in the company carried out a similar survey 18

months before and their information merely needs updating.

The range of uses of sampling is extremely wide. Whenever information has to be col-

lected, sampling can prove valuable. The following list gives a guide to the applications that

are frequently encountered:

(a) market research of consumer attitudes and preferences;

(b) medical investigations;

(c) agriculture (crop studies);

(d) accounting;

(e) quality control (inspection of manufactured output);

(f) information systems.

In all applications, sampling is a trade-off between accuracy and expense. By sampling at

all one is losing accuracy but saving money. The smaller the sample the greater the accuracy

loss but the greater the saving. The trade-off has to be made in consideration of the accuracy

required by the objectives of the study and the budget available. Even when the full

population is investigated, however, the results will not be entirely accurate. The same

problems that occur in sampling – non-response, the sampling frame and bias – will occur

with the full population. The larger the sample size, the closer the accuracy will be to the

maximum accuracy that is obtainable with the full population. It may even be the case that

measurement errors may overwhelm sampling errors. For instance, even a slightly ambigu-

ous set of questions in an opinion poll can distort the results far more than any inaccuracy

resulting from taking a sample rather than considering the full population. The concern that

many managers have that sample information must be vastly inferior to population infor-

mation is ill-founded. A modest sample can provide results which are of only slightly lower

accuracy than those provided by the whole population and at a fraction of the cost.

7. Distributions

Learning Objectives

By the end of the module the reader should be aware of how and why distributions,

especially standard distributions, can be useful. Having numbers in the form of a distribution

helps both in describing and analysing. Distributions can be formed from collected data, or

they can be derived mathematically from knowledge of the situation in which the data are

generated. The latter are called standard distributions. Two standard distributions, the

binomial and the normal, are the main topics of the module.

Proof of some of the formulae requires a high level of mathematics. Where possible

mathematical derivations are given but since the purpose of this course is to explore the

practical applications of techniques, not their historical and mathematical development, there

are situations where the mathematics is left in its ‘black box’.

Note on Technical Sections: Section 7.3 ‘Probability Concepts’, Section 7.5.3 ‘Deriving

the Binomial Distribution’ and Section 7.6.3 ‘Deriving the Normal Distribution’ are technical

and may be omitted on a first reading.

Sections

7.1 Introduction

7.2 Observed Distributions

7.3 Probability Concepts

7.4 Standard Distributions

7.5 Binomial Distribution

7.6 The Normal Distribution

Learning Summary

The analysis of management problems often involves probabilities. For example, postal

services define their quality of service as the probability that a letter will reach its destination

the next day; electricity utilities set their capacity at a level such that there is no more than

some small probability that it will be exceeded and power cuts necessitated; marketing

managers in contracting companies may try to predict future business by attaching probabili-

ties to new contracts being sought. In such situations and many others, including those

introduced earlier, the analysis is frequently based on the use of observed or standard

distributions.

An observed distribution usually entails the collection of large amounts of data from

which to form histograms and estimate probabilities.

A standard distribution is mathematically derived from a theoretical situation. If an actual

situation matches (to a reasonable approximation) the theoretical then the standard distribu-

tion can be used both to describe and analyse the situation. As a result fewer data need be

collected.

This module has been concerned with two standard distributions: the binomial and the

normal. For both, the following have been described:

(a) the situations in which it can be used;

(b) its derivation;

(c) the use of probability tables;

(d) its parameters;

(e) how to decide whether an actual situation matches the theoretical situation on which the

distribution is based.

The mathematics of the distributions have been indicated but not pursued rigorously.

The underlying formulae, particularly the normal probability formula, require a relatively

high level of mathematical and statistical knowledge. Fortunately such detail is not necessary

for the effective use of the distributions because tables are available. Furthermore, the role

of the manager will rarely be that of a practitioner of statistics, rather he or she will have to

supervise the use of statistical methods in an organisation. It is therefore the central concepts

of the distributions, not the mathematical detail, that are of concern. To look at them more

deeply goes beyond what a manager will find helpful and enters the domain of the statistical

practitioner.

The distributions that have been the subject of this module are just two of the many that

are available. However, they are two of the most important and useful. The principles behind

the use of any standard distribution are the same, but each is associated with a different

situation. A later module will look at other standard distributions and their applications.

8. Statistical Inference

Learning Objectives

Statistical inference is the set of methods by which data from samples can be turned into

more general information about populations. By the end of the module, the reader should

understand the basic underlying concepts. Statistical inference has two main parts. Estima-

tion is concerned with making predictions and specifying their accuracy; significance testing

is concerned with distinguishing between a result arising by chance and one arising from

other factors. The module describes some of the many different types of significance test. As

in the last module, some of the mathematics will have to be left in a ‘black box’.

Sections

8.1 Introduction

8.2 Applications of Statistical Inference

8.3 Confidence Levels

8.4 Sampling Distribution of the Mean

8.5 Estimation

8.6 Basic Significance Tests

8.7 More Significance Tests

8.8 Reservations about the Use of Significance Tests

Learning Summary

Statistical inference belongs to the realms of traditional statistical theory. Its relevance lies in

its applicability to specialised management tasks, such as quality control and market research.

Most managers would find that it can only occasionally be applied directly to general

management problems. Its major value is that it encompasses ideas and concepts which

enable problems to be viewed in broader and more structured ways.

Two areas have been discussed, estimation and significance testing. New theory – confi-

dence levels, the sampling distribution of the mean, the central limit theorem and the

variance sum theorem – has been introduced.

The conceptual contribution that estimation makes is to concentrate attention on the

range of a business forecast rather than merely the point estimate. To take a previous market

research example, the estimate that 61 per cent of male toiletries are purchased by females

sounds fine. But what is the accuracy of the estimate? The 61 per cent is no more than the

most likely value. By how much could the true value be different from 61 per cent? If it can

be said with near certainty (95 per cent confidence) that the percentage is between 58 per

cent and 64 per cent, then the estimate is a good one on which decisions may be reliably

based. If the range is eight per cent to 88 per cent, then there must be doubts about its

usefulness for decision making. Surprisingly, the confidence limits of business forecasts are

often reported with little emphasis, or not reported at all.

The second area considered was significance testing. It is concerned with distinguishing

real from apparent differences. The discrepancy between a sample mean and what is thought

to be the mean of the whole population is judged in the context of inherent variation. An

apparent difference is one that is likely to have arisen purely by chance because of the

inherent variation; a real difference is one that is unlikely to have arisen purely by chance and

some other explanation (i.e. that the hypothesis is untrue) is supposed. A significance level

draws a dividing line between the two. The dividing line marks an abrupt border. In practice,

extra care is exercised over samples falling in the grey areas immediately on either side of the

border.

A number of significance tests have been introduced and it can be difficult to know

which one to use. To illustrate the different circumstances in which each is appropriate, a

medical example will be used in which a new treatment for reducing cholesterol levels is

being tried out. Country-wide records are available showing that the existing treatment on

average reduces cholesterol levels by five units.

The three types of test described in the module are:

(a) Single sample. This is the basic significance test described in Section 8.6. Evidence from

one sample is used to test a hypothesis relating to the population from which it has

come. For example, to show that the new cholesterol treatment was more effective than

the existing treatment the hypothesis would be that the new treatment was no more

effective than the old, i.e. it reduced cholesterol levels by five units on average. A repre-

sentative sample of patients would be given the new treatment and the average reduction

in cholesterol measured. This would be compared with the hypothesised population

figure of five units.

(b) Two independent samples (Section 8.7.1). Two independently drawn samples are

compared, usually with the hypothesis that there is no difference between them. For

example, in trying out the new cholesterol treatment there might be some doubt about

the accuracy of the country-wide data on which the hypothesis was based. One way to

get round the problem would be to use two samples. The first would be a sample of

patients to whom the new treatment had been given and the second a ‘control’ sample of

patients to whom the old treatment was given. As before the hypothesis would be that

the new treatment was no better than the old. The average reduction measured for the

first sample would be compared to that from the second to test whether the evidence

supported this.

(c) Paired samples (Section 8.7.2). Two samples are compared but they are not drawn

independently. Each observation in one sample has a ‘partner’ in the other. For example,

instead of testing the ultimate effect of the new treatment in reducing cholesterol levels,

it might be helpful to know whether it worked quickly, taking effect within, say, three

days. To do this the new treatment would be given to a single sample of patients. Their

cholesterol levels would be measured at the outset and again three days later. There

would then be two samples, the first of cholesterol levels at the outset and the second of

levels three days later. However, each observation in one sample would be paired with

one in the other – paired because the two observations would relate to the same patient.

The hypothesis would be that the treatment had made no difference to cholesterol levels

after three days. As described in Section 8.7.2 the significance test would be carried out

by forming a new sample from the difference in cholesterol levels for each patient and

testing whether the average for the new sample could have come from a population of

mean zero.

If two independent samples had been used, i.e. the two samples contained different

patients (as for the independent samples above), and the cholesterol levels had been

measured for one sample at the outset and for the second three days later, any difference in

cholesterol levels might be accounted for by the characteristics of the patients rather than

the treatment.

In deciding how to conduct a significance test there are three other factors to consider.

First, the test can be conducted with probabilities or critical values. This is purely a matter of

preference for the tester – both would produce the same result (see Section 8.6.1). Second,

the test can be one-tailed or two-tailed. This decision is not a matter of preference and it

depends upon the purpose of the test and what outcome is wanted (Section 8.6.2). Third, the

test could use data in the form of proportions. This depends on the nature of the data,

whether proportional or not (Section 8.7.3).

Both estimation and significance testing can improve the way a manager thinks about

particular types of numerical problems. Moreover, they help to show the manager what to

look for in a management report: Does an estimate or forecast also include a measure of

accuracy? In making comparisons, are the differences real or apparent? From the point of

view of day-to-day management, this is where their importance lies.

9. More Distributions

Learning Objectives

By the end of this module the reader should be more aware of the very wide range of

standard distributions that are available as well as their applications in statistical inference.

Two standard distributions, the binomial and normal, and statistical inference were the

subjects of the previous two modules. Those fundamental concepts are amplified and

extended in this module. More standard distributions relating to a variety of theoretical

situations and their use in estimation and significance tests are described.

The module covers some advanced material and may be omitted the first time through

the course.

Sections

9.1 Introduction

9.2 The Poisson Distribution

9.3 Degrees of Freedom

9.4 t-Distribution

9.5 Chi-squared Distribution

9.6 F-Distribution

9.7 Other Distributions

Learning Summary

In respect of their use and the rationale for their application, the standard distributions

introduced in this module (Poisson, , chi-squared, , negative binomial and beta-binomial)

are in principle the same as the earlier ones (normal and binomial). Their areas of application

are to problems of inference, specifically estimation and significance testing. The advantages

their use brings are twofold. First, they reduce the need for data collection compared with

the alternative of collecting one-off distributions for each and every problem. Second, each

standard distribution brings with it a body of established knowledge that can widen and

speed the analysis.

The eight standard distributions encountered so far are just a few, but probably the most

important few, of the very many that are available. Each has been developed to cope with a

particular type of situation. Details of each distribution has then been recorded and made

generally available. When a new distribution has been developed and added to the list, it has

usually been because it is applicable to some particular problem which can be generalised. For

instance, W. S. Gosset developed the -distribution because of its value when applied to a

sampling problem in the brewing company for which he worked. Because this problem was a

special case of a general type of problem, the -distribution has gained wide acceptance.

To summarise, when one is faced with a statistical problem involving the need to look at

a distribution, there is often a better alternative than having to collect large amounts of data.

A wide range of standard distributions are available and may be of help. Table 1 summarises

the standard distributions described so far and the theoretical situations from which they

have been derived.

One of the principal uses of standard distributions is in significance testing. Table 2 lists

four types of significance test and shows the standard distribution that is the basis of each.

In addition to their direct application to problems, standard distributions are fundamental

to many other parts of formal statistical analysis. In later modules a knowledge of distribu-

tions, particularly the normal, will be fundamental.

Table 1 Summary of standard distributions

Distribution Situation

Normal Observations taken (or measurements made) of some quantity which is essentially constant but is subject to many small,

additive, independent disturbances.

Binomial Samples taken from a population in which the elements are of two types. The variable is the number of elements of one of the

types in the sample.

Poisson Samples taken of a continuum (e.g. time, length). The variable is the number of ‘events’ in the sample.

Similar to the normal but where the standard deviation is estimated from a sample of size < 30.

Chi-squared Sample taken from a normal population. The variable is based on the ratio between the sample variance and the population

variance.

Two samples taken from a normal population. The variable is the ratio between the variances of the two samples.

Negative binomial Like the Poisson, but with the parameter, λ, itself subject to variation across the population.

Beta-binomial Like the binomial, but with the parameter, , subject to variation

across the population.

Table 2 Summary of significance tests

Significance test Distribution

Comparing a sample mean with a population

mean. Normal (if sample size ≥ 30); (if sample size

< 30)

Comparing one sample mean with another sample mean.

Normal (if combined sample ≥ 30); (if combined sample < 30)

Comparing a sample variance with a

population variance.

Chi-squared

Comparing one sample variance with

another sample variance.

10. Analysis of Variance

Learning Objectives

Up to now the statistical tests have concentrated on the differences between two samples. In

practice a number of different samples are often available and there is need for a test which

shows whether there are statistically significant differences within a group of samples.

Analysis of variance is such a test. By the end of the module the reader should know how

analysis of variance extends statistical inference from the one- and two-sample tests

described in earlier modules to many-sample tests. He or she should know the difference

between one-way and two-way analyses of variance and the type of problems to which they

are applied, and should also appreciate further extensions of the subject to more complicated

tests. As with all statistical inference the crucial underlying assumptions and points of

practical interest should accompany the knowledge of how and where to apply them.

This module covers advanced material and may be omitted first time through the course.

Sections

10.1 Introduction

10.2 Applications

10.3 One-Way Analysis of Variance

10.4 Two-Way Analysis of Variance

10.5 Extensions of Analysis of Variance

Learning Summary

Analysis of variance is one of the most advanced topics of modern statistics. It is far more

than an extension of two-sample significance tests for it allows significance tests to be

approached in a much more practical way. The additional sophistication allows significance

tests to be used far more realistically in areas such as market research, medicine and

agriculture.

In practical situations there is a close association between analysis of variance and re-

search design. Although multi-factor analysis of variance is theoretically possible, attempts to

carry out such tests can involve large amounts of data and computing power. Moreover,

large and involved pieces of work can be more difficult to comprehend conceptually than

statistically. The results often present enormous problems of interpretation. Consequently,

before one embarks upon lengthy analyses, time must be spent planning the research so that

the eventual statistical testing is as simple as possible. This process is known as experi-

mental design. It offers methods of isolating the main effects as simply as possible. If it is

at all possible, multi-factor analysis of variance should only be undertaken after very careful

planning of the research.

11. Regression and Correlation

Learning Objectives

Regression and correlation are concerned with relationships between variables. By the end of

this module the reader should understand the basic principles of these techniques and where

they are used. He or she should be able to carry out simple analyses using a calculator or a

personal computer. The many pitfalls in practical applications should also be known.

The module deals with simple linear regression and correlation at a non-statistical level.

The aim is to explain conceptually the principles underlying these topics and highlight the

management issues involved in their application. The next module extends the topics and

describes the statistical background.

Sections

11.1 Introduction

11.2 Applications

11.3 Mathematical Preliminaries

11.4 Simple Linear Regression

11.5 Correlation

11.6 Checking the Residuals

11.7 Regression on a Personal Computer (PC)

11.8 Some Reservations about Regression and Correlation

Learning Summary

Regression and correlation are important techniques for predicting and understanding

relationships in data. They have a wide range of applications: economics, sales forecasting,

budgeting, costing, human resource planning, corporate planning etc. The underlying

statistical theory (outlined in the next module) is extensive. Unfortunately the depth of the

subject can in itself lead to errors. Users of regression can allow the statistics to dominate

their thought processes. Many major errors have been made because the wider non-statistical

issues have been neglected. As well as providing company knowledge and broad expertise,

managers have a role to play in drawing attention to these wider issues. They should be the

ones asking the penetrating questions about the way regression and correlation are being

applied. If not the managers, who else will?

Managers can only do this, however, if they have a reasonable grasp of the basic princi-

ples (although they should not be expected to become experts nor to be involved in the

technical details). Only when they have taken the trouble to equip themselves in this way will

they be taken seriously when they participate in discussions. Only then will they take

themselves seriously and have sufficient confidence to participate in the discussions.

Regression and correlation have a mixed track record in organisations, varying from high

success to abject failure. A key to success seems to be for managers to become truly

involved. Too often the managers pay lip-service to participation. Their contribution is

potentially very large. To make it count they need to be aware of two things. First, the broad

principles and managerial issues (the topics in this module) are at least as important as the

technical, statistical aspects. Second, knowledge of the statistical principles (the topic for the

next module) is necessary, not in order that they may do the regression analyses themselves,

but as a passport to a legitimate place in discussions.

12. Advanced Regression Analysis

Learning Objectives

Regression and correlation are complicated subjects. The previous module presented the

basic concepts and the managerial issues involved. In this module, the basic concepts are

extended in three directions. First, multiple regression deals with equations involving more

than one variable. Second, non-linear regression allows relationships to be based on

equations that represent curves. Third, the statistical theory underlying regression is

described. This last topic permits rigorous statistical tests to be used in the evaluation of the

results. Finally, to bring together all aspects of regression and correlation, a step-by-step

approach to carrying out a regression analysis is given.

This module contains advanced material and may be omitted first time through the

course.

Sections

12.1 Introduction

12.2 Multiple Regression Analysis

12.3 Non-Linear Regression Analysis

12.4 Statistical Basis of Regression and Correlation

12.5 Regression Analysis Summary

Learning Summary

This module has extended the ideas of simple linear regression by removing the limitations

of ‘simple’ and ‘linear’. First, multiple regression analysis makes the extension beyond simple

regression. It allows changes in one variable (the variable) to be explained by changes in

several other variables (the variables). Multiple regression analysis is based on the same

principle, the least-squares criterion, as simple regression. However, the addition of the extra

variables does bring about added complications. ‎Table 3 summarises the similarities and

differences between the two cases as far as their practical application is concerned.

Table 3 Comparing single and multiple regression

Similarities

(a) Substitution of values in regression equation to make predictions.

(b) test to measure closeness of fit.

(c) Checking of residuals for randomness.

(d) Use of SE(Pred) to measure accuracy.

Differences

(a) Adjustment of correlation coefficient to allow for degrees of freedom.

(b) test to determine variables to leave out.

(c) Check for collinearity.

The second extension beyond linear regression is to ‘curved’ relationships between varia-

bles. This is done by transforming one or more of the variables so that the equation can be

handled as if it were linear. The range of possible transformations is wide, allowing a variety

of non-linear relationships to be modelled through regression.

The possibilities of many explanatory variables and many types of equation may seem to

be advantageous but it leads to a danger. This is that more and more regression equations

will be tried until one is found that just happens to fit the set of observations that are

available. Indeed there is a technique, called stepwise regression, which is a process for

regressing all possible combinations of variables and selecting the one that, statistically, is the

best. The risk is that causality will be forgotten. Ideally the role of regression should be to

confirm some prior belief, rather than to find a ‘belief’ from the data. This latter process

can of course be successful but it is likely to lead to many purely associative relationships

between variables. In multiple and non-linear regression analysis it is more important than

ever to ask the question: Is the regression sensible? This question should be asked even

when the statistical checks are satisfactory.

The theoretical background to regression has also been introduced. The whole subject is

large and complex. The surface has been scratched in this module but no more. A further

extension to the topic would have been to look at criteria other than that of least squares.

Even within the least-squares criterion the statistical tests presented are just a few of the

many available. Fortunately, computer packages, so essential to all but the smallest of

problems, can carry out these tests automatically. On the other hand, when a package carries

out many tests, some of which are alternatives, the problem of interpreting the computer’s

output is an important one. A major problem faced by new users of regression analysis is

that, while they may have a good understanding of the topic, their first sight of a computer’s

output causes them to doubt. The answer is not to be put off by the initial shock, but to

persevere and select from the output just those parts that are required. Computer packages

are trying to satisfy a wide range of users at all levels of sophistication. For this reason their

output tends to be confusingly large.

Perhaps the best advice in this statistical minefield is to make the correct balance between

statistical and non-statistical factors. For example, the test for the inclusion of variables in a

multiple regression equation should be taken carefully into account but not to the exclusion

of other factors. In the earlier example on predicting sales of children’s clothing, the value

for advertising was only 1.3. Statistically it should be excluded. On the other hand, if it has

been found from other sources (such as market research interviews) that advertising does

have an effect, then the variable should be retained. The poor statistical result may have

arisen because of the limited sample chosen or because of data inaccuracy. The profusion of

complex data produced by regression analyses can promote a spurious sense of accuracy and

a spurious sense of the importance of the statistical aspects. It is not unknown for experts in

regression analysis to make mountains out of statistical molehills.

13. The Context of Forecasting

Learning Objectives

The intention of this module is to provide a background to business forecasting. By the end

the reader should know what it can be applied to and the types of techniques that are used.

Special attention is paid to qualitative techniques at this stage since they are the alternative to

the quantitative techniques which are usually thought to form the nucleus of the subject.

Sections

13.1 Introduction

13.2 A Review of Forecasting Techniques

13.3 Applications

13.4 Qualitative Forecasting Techniques

Learning Summary

The obvious characteristic that distinguishes qualitative from quantitative forecasting is that

the underlying information on which it is based consists of judgements rather than numbers,

but the distinction goes beyond this. Qualitative forecasting is usually concerned with

determining the boundaries within which the long-term future might lie; quantitative

forecasting tends to provide specific point forecasts and ranges for variables in the nearer

future. Qualitative forecasting offers techniques that are very different in type, from the

straightforward, exploratory Delphi method to the normative relevance trees. Also, qualita-

tive forecasting is at an early stage of development and many of its techniques are largely

unproven.

Whatever the styles of qualitative techniques their aims are the same, to use judgements

systematically in forecasting and planning. In using the techniques it should be borne in

mind that the skills and abilities that provide the judgements are more important than the

techniques. Just as it would be pointless to try a quantitative technique with ‘made-up’

numerical data, so it would be folly to use a qualitative technique in the absence of real

knowledge of the situation in question. The difference is that it is perhaps easier to discern

the lack of accurate data than the lack of genuine expertise.

On the other hand, where real expertise does exist, it would be equal folly not to make

use of it. For long-term forecasting by far the greater proportion of available information

about a situation is probably in the form of judgement rather than numerical data. To use

these judgements without the help of a technique usually results in a plan or forecast biased

by personality, group effects, self-interest etc. Qualitative techniques offer chances to distill

the real information from the surrounding noise and refine it into something useful.

In spite of this enthusiasm there is a warning. In essence most qualitative techniques

come down to asking questions of experts, albeit scientifically. Doubts about the value of

experts are well entrenched in management folklore. But doubts about the questions can be

much more serious, making all else pale into insignificance. Armstrong (1985) quotes the

following extract from a survey of opinion by Hauser (1975).

Question % answering yes

1. Do you believe in the freedom of speech? 96

2. Do you believe in the freedom of speech to the extent of allowing radicals to hold meetings and express their views to the community?

22

The lesson must be that the sophistication of the techniques will only be worth while if

the forecaster gets the basics right first.

14. Time Series Techniques

Learning Objectives

By the end of the module the reader should know where to use time series methods. Time

series data are distinguished by being stationary or non-stationary. In the latter case the series

may contain one or more of a trend, seasonality or a cycle. The module describes at least one

technique to deal with each type of series.

Technical sections: Sections marked with * contain technical material and may be

omitted on a first reading of the module.

Sections

14.1 Introduction

14.2 Where Time Series Methods Are Successful

14.3 Stationary Series

14.4 Series with a Trend

14.5 Series with Trend and Seasonality

14.6 Series with Trend, Seasonality and Cycles

14.7 Review of Time Series Techniques

Learning Summary

In spite of the fact that surveys have demonstrated how effective time series methods can

be, they are often undervalued. The reason is that, since a variable is predicted solely from its

own historical record, the methods have no power to respond to changes in business or

company conditions. They work on the assumption that circumstances will be as in the past.

Nevertheless, their track record is good, especially for short-term forecasting. In addition,

they have one big advantage over other methods. Because they work solely from the historical

record and do not necessarily require any element of judgement or forecasts of other causal

variables, they can operate automatically. For example, a large warehouse, holding thousands of

items of stock, has to predict future demands and stock levels. The large number of items,

which may be of low unit value, means that it is neither practicable nor economic to give each

variable individual attention. Time series methods will provide good short-term forecasts by

computer without needing managerial attention. Of course, initially some research would have

to be carried out, for instance to find the best overall values of smoothing constants. But once

this research was done, the forecasts could be made automatically. All that would be needed

would be the updating of the historical record as new data became available. Especially with a

computerised stock system this should cause little difficulty.

The conclusion is therefore not to underestimate time series methods. They have ad-

vantages in cost and, in the short term, in accuracy over other methods.

15. Managing Forecasts

Learning Objectives

The purpose of this module is to describe what managers need to know if they are to use

forecasts in their work. It is stressed that forecasting should be viewed as a system, not a

technique. The system needs to be managed and it is here that the manager’s role is crucial.

The parts of it that fall within a manager’s sphere rather than that of the forecasting expert

are discussed in some detail. Some actual and costly mistakes in business forecasting will

demonstrate the crucial nature of the manager’s role. By the end the readers should know

how they can use the forecasting techniques described in previous modules effectively in

their organisations.

Sections

15.1 Introduction

15.2 The Manager’s Role in Forecasting

15.3 Guidelines for an Organisation’s Forecasting System

15.4 Forecasting Errors

Learning Summary

Managers have a clear role in ‘managing’ forecasts. But increasingly they are also finding a

role as practitioners of forecasting. The advent of personal computers has led to this

development. Management journals have recently been reporting this phenomenon. The low

cost of a fairly powerful personal computer means that it is not a major acquisition; software

and instruction manuals are readily available. With a small investment in time and money,

managers, frustrated by delays and apparent barriers around specialist departments, take the

initiative and are soon generating forecasts themselves. They can use their own data to make

forecasts for their own decisions without having to work through management services or

data processing units.

This development has several benefits. The link between technique and decision is made

more easily; one person has overall understanding and control; time is saved; re-forecasts are

quickly obtained. But of course there are pitfalls. There may be no common database, no

common set of assumptions within an organisation. For instance, an apparent difference

between two capital expenditure proposals may have more to do with data/assumption

differences than with differences between the profitabilities of the projects. Another pitfall is

in the use of statistical techniques which may not be as straightforward as the software

manual suggests. The use of techniques by someone with no knowledge of when they can or

cannot be applied is dangerous. A time series method applied to a random data series is an

example. The computer will always (nearly always) give an answer. Whether it is legitimate to

base a business decision on it is another matter.

However, it is with management aspects of forecasting that this module has primarily

been concerned. It has been suggested that this is an area of expertise too often neglected

and that it should be given more prominence. Statistical theory and techniques are of course

important as well but the disproportionate amounts of time spent studying and discussing

them give a wrong impression of their importance relative to management issues.

In particular, the topics covered as steps 7–9 in the guidelines – the incorporation of

judgements, implementation and monitoring – are given scandalously little attention within

the context of forecasting. This is generally true whether books, courses, research or the

activities of organisations are being referred to. A moment’s thought demonstrates that this

is an error. If a forecasting technique is wrongly applied, good monitoring will permit it to be

adjusted speedily: the situation can be retrieved. If judgements, implementation or monitor-

ing are badly done or ignored, communication between producers and users will probably

disappear and the situation will be virtually impossible to retrieve.

Why should these issues be held in such low regard? Perhaps the answer lies in the wide-

spread attitude which says that a manager needs to be taught statistical methods but that the

handling of judgements, implementation and monitoring are matters of instinct which all

good managers have. They are undoubtedly management skills, but whether they are

instinctive is another matter. Whatever the reason, the effect of this inattention is almost

certainly a stream of failed forecasting systems.

How can the situation be righted? A different attitude on the part of all concerned would

certainly help, but attitudes are notoriously hard to change. A long-term, yet realistic

approach calls for more information. Comparatively little is known about these management

aspects. If published reports and research on the management of forecasting were as

plentiful as they are on technical aspects, a great improvement could be anticipated.

Even so, the best advice of all is probably to avoid forecasting. Sensible people should

only use forecasts, not make them. The general public and the world of management judge

forecasts very harshly. Unless they are exactly right, they are failures. And they are never

exactly right. This rigid and unrealistic test of forecasting is unfortunate. The real test is

whether the forecasting is, on average, better than the alternative which is often a guess,

frequently not even an educated one.

A more positive view is that the present time is a particularly rewarding one to invest in

forecasting. The volatility in data series seen since the mid-1970s puts a premium on good

forecasting. At the same time facilities for making good forecasts are now readily available in

the form of a vast range of techniques and wide choice of relatively cheap microcomputers.

With the latter even sophisticated forecasting methods can be applied to large data sets. It

can all be done on a manager’s desk-top without the need to engage in lengthy discussions

with experts in other departments of the organisation.

Whether the manager is doing the forecasting in isolation or is part of a team, he or she

can make a substantial contribution to forward planning. To do so, a systematic approach to

forecasting through the nine guidelines and an awareness of the hidden traps will serve that

manager well.