Centre for Marketingfacultyresearch.london.edu/docs/97-202.pdfPeter J. Danaher Jennifer M. Lawrie...

Centre for Marketing

BEHAVIORAL MEASURES OF TELEVISIONAUDIENCE APPRECIATION

Peter J. DanaherJennifer M. Lawrie

Centre for Marketing Working PaperNo. 97-202

October 1997

Peter Danaher is Professor of Marketing at Auckland University and has been a VisitingProfessor at London Business School. Jennifer Lawrie is a Lecturer at Auckland

University. The authors would like to thank A. C. Nielsen-McNair, New Zealand forproviding the data. They also thank Paddy Barwise for his constructive comments.

London Business School, Regent's Park, London NW1 4SA, U.K.Tel: +44 (0)171 262-5050 Fax: +44 (0)171 [email protected] http://www.lbs.ac.uk

Copyright © London Business School 1997

�

Behavioral Measures of Television Audience Appreciation

Abstract

The dual demands of efficiency in advertising spend and the provision of television

programs that viewers like has resulted in a need for both quantitative and qualitative

measures of viewing audience. Ratings, as a measure of the size of an audience are well

known and frequently used, but less used are measures of audience appreciation, usually

due to the additional cost and complexity. In this paper we explore the possibility of using

minute-by-minute people meter data to develop behavioral measures of program

appreciation. Specifically, we look at the number of minutes viewed out of the potential

number of minutes a program airs. A second measure looks at the proportion of viewers

that watch at least 80% of a show.

Clearly, any advertiser, broadcaster or ratings supplier with access to minute-by-

minute people meter ratings can easily perform the calculations we propose, so a separate

survey of program appreciation is not needed. Furthermore, no additional respondent

burden is demanded of people meter panelists, so the measure is completely unobtrusive.

Lastly, it does not rely on respondent recall, as most of the current diary-based methods do.

We show that the behavioral-based measures of program appreciation have similar

properties to current measures, being high on average across all shows, higher for more

‘demanding’ shows, reasonably consistent across episodes of the same show and higher for

higher rating shows. In addition, we observe patterns not often detected by current

measures, such as a wide range of appreciation scores, differences in appreciation scores

across program types, differences across episodes for ‘magazine’ type shows such as

‘20/20’ and large differences across some demographics such as age.

1

Introduction

With ever-increasing pressure to contain advertising budgets, advertisers are keen to

find ways to improve the efficiency of their media planning (Danaher and Rust 1994; White

and Miles 1996). Lloyd and Clancy (1991) focused on selecting television programs that are

more likely to involve a viewer. The idea is that if two programs have the same rating

among the target group but one program has higher involvement among its viewers then

viewers of this program are more likely to recall the embedded ads and have higher

purchase intention for the products advertised. Lloyd and Clancy (1991) called the

phenomenon of higher ad effectiveness in more involving shows the “positive effects

hypothesis”. On the basis of this hypothesis they proposed using an index called ‘Cost Per

Thousand Involved’ (CPMI) in preference to the traditional CPM to reflect the importance

of program involvement in media buying.

While the positive effects hypothesis seems plausible, it is controversial, as many

other studies have shown the reverse effect to be the case (see Carrie 1997 and Lloyd and

Clancy 1991 for thorough reviews). A compromise was offered by Tavassoli et al (1995)

who posited an inverted-U shape relationship between program involvement and ad

effectiveness. It is not the purpose of this article to settle the positive and negative effects

hypothesis debate. Rather, we take the view that whatever position an advertiser takes, they

will need to have some measure of program involvement/appreciation, so we concentrate on

developing such a measure.

A 1993 study by Carat Research UK Ltd. is particularly relevant here because it

found that “advertising is 70% more effective when it is seen by people enjoying the

programme”. This highlights the importance of monitoring program appreciation, which is

often measured via program enjoyment/interestingness. We explore the possibility of using

2

minute-by-minute people meter data to develop behavioral measures of program

appreciation. Specifically, we look at the number of minutes viewed out of the potential

number of minutes a program airs. For example, if a person tunes to the entire 30 minutes

duration of a comedy show then it is likely that they enjoy the comedy more than someone

who watches just 15 minutes of the comedy plus 15 minutes of other programming in the

same 30 minute period. At the very least, we can say that a person who watches more of a

program is more committed to the show. Therefore, we calculate the proportion of program

minutes that a person watches and average this over all viewers of the show. We set a

threshold, being that a person must watch at least 20 percent of the available minutes to be

included in the average. This omits casual viewers of just a few minutes, who are possibly

avoiding ads on other channels. A second measure looks at the proportion of viewers that

watch at least 80% of a show. Our behavioral measures of audience appreciation contrast

with all previous measures, which use subjective attitudinal measures of program

appreciation, being one or more of claimed enjoyment, interest, information content, appeal

or personal emotion (Barwise and Ehrenberg 1988; Carrie 1997; Meneer 1987; TAA 1984;

Windle and Landy 1996).

Clearly, any advertiser, broadcaster or ratings supplier with access to minute-by-

minute people meter ratings (or even two-minute ratings) can easily perform the

calculations we propose, so a separate survey of program appreciation is not needed. Such

surveys are conducted in Canada and the UK, for instance, at high cost (Barwise and

Ehrenberg 1988; Windle and Landy 1996). Furthermore, no additional respondent burden is

demanded of people meter panelists, so the measure is completely unobtrusive. Lastly, it

does not rely on respondent recall, as most of the current diary-based methods do.

3

We show that the behavioral-based measures of program appreciation have similar

properties to current measures, being high on average across all shows, higher for more

‘demanding’ shows, reasonably consistent across episodes of the same show and higher for

higher rating shows (Barwise and Ehrenberg 1988; Carrie 1997). In addition, we observe

patterns not often detected by current measures, such as a wide range of appreciation

scores, differences in appreciation scores across program types, differences across episodes

for ‘magazine’ type shows such as ‘20/20’ and large differences across some demographics

such as age.

Previous and Current Measures of Audience Appreciation

Research on audience reactions to television programs dates back to the 1960s.

Various private research firms and public broadcasters have, over the years, produced their

own appreciation measures in an attempt to supplement audience size measures, by

measuring viewer attitudes to, preferences for and involvement levels with programs

(Barwise et al 1979; Barwise and Ehrenberg 1987; Meneer 1987; Kent 1994; Windle and

Landy 1996).

Diary-Based Methods

In the US, Television Audience Assessment Inc (TAA 1984) was set up specifically

to market measures of audience reactions to the television networks and advertisers. A

weekly viewing diary measured the appeal of programs (a personal program rating on a 5

point scale), and the impact of programs (the extent to which they “touch one’s feelings”)

(Wulfsberg and Holt 1986). TAA is no longer operating so we will not discuss it in further

detail.

4

One of the best known and longest-running audience appreciation surveys is

conducted in the UK for the Broadcasters’ Audience Research Board (BARB). Information

is gathered weekly from a nationally representative diary panel of 3,000 respondents. Until

1993 the audience appreciation data were collected by the Independent Broadcasters

Authority using the following six point scale for each program watched: extremely

interesting and/or enjoyable (I/E), very I/E, fairly I/E, neither one thing nor the other, not

very I/E and not at all I/E. These six levels were assigned numbers 100, 80, 60, 40, 20 and

0, respectively. Since 1993 Research Services Limited have conducted the audience

appreciation survey on behalf of BARB. While the survey methodology has continued to be

diary-based, the measurement scale has changed to ‘marks out of 10’ for a program being

interesting and/or enjoyable. This score is multiplied by 10 to give comparability with the

previous scores (Windle and Landy 1996).

Barwise and Ehrenberg (1988, p. 50) report that the BARB appreciation scores

have a narrow range, being between about 60 and 80, with the average being about 70. This

narrow range is thought to be a limitation of current audience appreciation scores (Meneer

1987), especially if their purpose is to develop alternative indices to CPM for advertising

program selection (Lloyd and Clancy 1991).

The narrow range of scores for BARB’s appreciation scores may be due to several

factors. Firstly, Barwise and Ehrenberg (1988) state that “viewers mostly say that they quite

like what they watch and they watch what they say they quite like”. That is, there may be

some post-viewing justification. After all, only a fool would admit to not liking a program

they had just spent one hour watching. Secondly, most appreciation research employs self

completion diaries. These are widely known to be burdensome on respondents and unless

5

they are filled in diligently after viewing each program, problems of recall arise and

respondents may develop a routinized response to diary completion (Beed 1992).

People Meter-Based Methods

Many countries now operate a people meter panel for television audience

measurement. This method is technically more sophisticated than diaries and is better at

capturing viewing of small channels and day time and late night viewing (Beed 1992).

Soong (1988) has shown that people meter ratings are statistically more reliable than those

derived from diaries. However, people meters demand that panelists log in and out with a

remote control when they start and stop watching television and this is not always done

correctly (Danaher and Beed 1993). Furthermore, younger panelists have exhibited ‘fatigue’

in their button pushing behavior (CONTAM 1988). That is, they record less of their

viewing, by pushing their people meter button, as their time on the panel increases.

A common check on panelists’ button pushing behavior is via a ‘Coincidental

Survey’ (Danaher and Beed 1993). Here, panelists are called up and asked “when the phone

just rang were you watching TV?” A panelist’s verbal answer is compared with what the

meter says they were doing with respect to television viewing. Danaher and Beed (1993)

reported that about 91% of panelists are compliant, in that their claimed viewing matches

their actual viewing according to the people meter. Interestingly, of the 9% who are

noncompliant, about half say they are watching when they are not logged in, while the other

half say they are not watching but they are logged in. The net effect on the ratings is that the

‘overs and unders’ cancel out and the people meter ratings are very close to ratings that

would be obtained from panelists’ claimed viewing. Lastly, Danaher (1995) reported that

this high compliance rate continues even during advertising breaks.

6

While people meters are generally only used to measure viewing (yes/no), some

meters do have a voting button. This button gives panelists the opportunity to give a score

between 1 and 10 on how much they like the show they are watching. Panelists are

periodically prompted to enter their score. As far as we know, the only countries where

such voting is routinely implemented are Austria and the Netherlands, although the meters

in many other countries have voting capability. Probably the biggest reason that regular

voting is not asked of panelists is that panel operators think that enough demands are

already made on their respondents to log in and out and this additional demand would only

accelerate fatigue and ‘drop out’ from the panel. Also, it is likely that those who go to the

trouble of voting do so because they tend to like a show and vice versa for those who do

not vote. This would inflate the average voting score, making them unreliable.

Therefore, while most people meter panels can be used to measure audience

appreciation directly, to date this feature of the meter has not been used extensively. In the

next section we show how people meters can be used to get a measure of audience

appreciation without panelists having to vote. All they have to do is log in and out as usual.

Behavioral Measures of Audience Appreciation

While people meters have the capability to record second-by-second ratings, most

ratings suppliers distribute minute-by-minute ratings (Danaher 1995). These are ample for

our purposes. The concept behind our two behavioral measures of audience appreciation is

that ‘viewers vote with their feet’. That is, if you do not particularly like a show then in all

likelihood you will change channels or switch the set off. With the widespread prevalence of

remote controls, changing channels is effortless for most viewers.

7

A person who watches every possible minute of a program might be called

committed to the show. Hence, our first measure, called the Percentage of Minutes Viewed

(PMV), is the proportion of minutes viewed out of the total duration of the show. For

example, if someone watches 27 out of 30 airtime minutes of a sitcom then their PMV is

27/30=90%. The individual-level PMV scores are averaged over all people to obtain the

program’s final PMV. We examined the distribution of number of minutes viewed for

several programs and found a handful of people that had ‘watched’ five minutes or less of

the show. These people were usually avoiding commercials on other channels and cannot

really be classified as viewers of the program under study. For this reason, to be included in

the calculation of the average PMV, a person must have watched at least 20% of the

potential screening time of the program.

A potential criticism of the PMV is that it measures not just program time but

advertising time. A person might consider themselves committed to the program but not the

ads. One way to handle this is to strip out the commercials from the total potential minutes

of the show. However, with minute-by-minute data this would not be exact since

commercial breaks rarely coincide with the start and stop of clock minutes. A much simpler

alternative is to recognise that between 10 and 25 percent of air time for commercial

stations is devoted to nonprogram material (Danaher 1995). So if someone is committed to

a show but is also trying to avoid ads they could legitimately miss about 20% of the

available duration of the ‘program’ (show plus commercials) and still be strongly committed

to the show. This figure can be adjusted for different countries depending on their level of

advertising. Hence, our second measure is the proportion of viewers that watch 80% or

more (P80+) of the program under study. Again, there is a minimum threshold for the P80+,

being that someone must watch at least 20% of the show to be classified as a viewer.

8

The Data

The data used in this study were generously supplied by ACNielsen•McNair New

Zealand Ltd. from their people meter panel comprising about 1050 people from 440

households. The data were individual-level people meter records for 56 television programs.

Four consecutive episodes of each program were used. For each episode and each person

the records had the number of minutes of the designated program viewed, the number of

minutes of any television viewed and the duration of the designated program.

Table 1 lists all the shows. Most of them are prime time programs, of length 30 or

60 minutes. They were regular weekly or nightly shows so that consistency of commitment

scores across episodes could be evaluated. One-off programs such as movies, sporting

events, and so on were not included, although commitment scores can, of course, be

calculated for such programs. The shows represent a range of program types, being,

comedies, dramas, soap operas, day time talk shows, evening news, game shows, science

fiction and ‘tabloid’ style news. Some 28 of the shows are from the U.S., 14 are New

Zealand produced, 11 are from Britain and three were made in Australia.

In addition to viewing records, we had demographic data on each panelist, including

age, sex, weight of viewing, occupation, ethnicity, household size and number of TVs in the

home.

An important factor to note for this panel is that around 75% of homes have a

remote control on their television(s). This makes it very convenient for the vast majority of

viewers to switch channels or turn the set off.

9

Internal Validity

Before reporting the results we discuss the validity of the measures. This first is

internal validity, which we check by correlating PMV and P80+ with the number of

episodes viewed. That is, if a person is committed to a single episode of a program, we

would expect them to watch the program more frequently. Barwise and Ehrenberg (1988,

p. 51) observed such a correlation for BARB’s audience appreciation measure, where the

average liking score was only 40 for those who claim to watch one out of five episodes, but

increased to 84 for those claiming to view five out of five episodes.

In our case, we have actual (not claimed) viewing data on four episodes of each

program. For each person, we have their average PMV across the four episodes and the

number of episodes they saw. Across all 56 programs the average of these correlations

between PMV and number of episodes seen is 0.77, with the lowest value being 0.67 and

highest 0.87. For the P80+ measure, the average correlation was .83, with the lowest and

highest being, respectively, .62 and .93. This indicates good internal reliability of the

behavioral measures.

External Validity

A sterner test of reliability of the behavioral measures is to evaluate the correlation

between PMV and P80+ with some external measure of program involvement. We were

able to examine such a correlation by comparing PMV and P80+ scores for our programs

with ratings of program enjoyment and frequency of viewing from an independent survey.

For confidentiality reasons we cannot disclose full details of the survey, but it was a large

scale diary-based audience appreciation survey conducted about the same time as our

people meter data were collected. In this survey, respondents were presented with lists of

television programs and asked to rate them on two scales. The first was a four point

10

enjoyment scale, ranging from ‘really enjoy’ down to ‘don’t enjoy it at all’. The second

scale also had four points, with the most frequent category being ‘watch every time’ and the

least frequent being ‘hardly ever watch’.

We obtained ‘top box’ enjoyment and frequency of watching scores for the 22

programs that were in common across our people meter data and those of the independent

survey. By ‘top box’ we mean the proportion of people that gave a program a ‘really enjoy’

score for enjoyment or a ‘watch it every time’ score for frequency of viewing. The

correlation between the enjoyment scale and the PMV and P80+ scores are, respectively,

0.51 and 0.50. For the frequency of watching question, the respective correlations are 0.59

and 0.52 for PMV and P80+. All four correlations are significantly different from zero.

Given that totally independent people made the evaluations of commitment and

appreciation measures and that the methodologies are very different, these correlations are

encouragingly high. This gives us some confidence in the reliability of the behavioral

measures of appreciation, when put alongside traditional attitudinal measures.

Results

Distribution of the Commitment Measures

Table 1 gives the PMV and P80+ scores (averaged over four episodes) for all the

programs. Figures 1a and 1b present histograms of these scores across all 56 programs,

while Table 2 gives some descriptive statistics for the two measures. The first thing to

notice is that the P80+ has a much greater range (50 percentage points) than the PMV (28

percentage points). However, both commitment measures have a wider range than BARB’s

audience appreciation scores, which is only 20 percentage points (Barwise and Ehrenberg

11

1988). It is not just BARB appreciation scores that have a narrow range. Very similar

ranges have been reported in the US and Germany by Gunter and Wober (1992).

Figures 1a and 1b show that both PMV and P80+ have reasonable symmetry around

their mean. This is corroborated in Table 1, which shows that the mean and median PMV

and P80+ values are close. This symmetry is certainly not a feature of BARB’s audience

appreciation scores, which have a strong skew to the high end of the scale (Windle and

Landy 1996).

As expected the average PMV (78) is much higher than the average P80+ (60).

Barwise and Ehrenberg (1988) reported the average BARB appreciation score to be about

70, while Carrie’s (1997) analysis of more recent BARB data gave a similar average of 72.

To some extent the high PMV values are due to the 20% viewing threshold, which has the

effect of raising the average PMV. We believe the benefits of this outweigh the

disadvantages of having a higher average PMV, especially given that the PMV still has a

wider range than traditional appreciation scores and retains symmetry.

As might be expected, the PMV and P80+ scores are highly correlated, with a

correlation of 0.92 across the 56 programs.

Commitment Scores Across Different Episodes

Given that part of the reason for having program appreciation scores is to get an

index of program quality (Carrie 1997), television programmers and producers, as well as

advertisers are keen to know whether viewers’ enjoyment of a show varies from one

episode to another. Barwise and Ehrenberg (1988) showed that episode to episode variation

was small with the BARB appreciation measure even though the plots and content of the

shows vary.

12

We calculated the average PMV and P80+ values for each of the four episodes of

each program and compared these means via one-way ANOVA (the null hypothesis is that

all four episode scores are the same). For the PMV measure, over three-quarters (43 out of

56) of these ANOVAs were not significant at the 1% level (we used the 1% significance

level since the conventional 5% level would give about three, being 0.05x56, significant

results even if the null hypothesis were true). For the P80+ measure about the same number,

44 out of 56 programs, showed no significant variation from episode to episode.

Although it is apparent that program commitment scores exhibit considerable

consistency across episodes, a trend emerges for those shows which do not have consistent

commitment scores. About half of these nonconsistent programs are comedies, including

‘Roseanne’, ‘Hang’in with Mr Cooper’, ‘Grace Under Fire’, two home video shows, ‘Last

of the Summer Wine’ and ‘Fawlty Towers’. The last two of these comedies were reruns and

it is likely that some viewers, having sampled the first 5 to 10 minutes of the show realized

they had seen the show before and switched channels. Barwise and Ehrenberg (1988, p.53)

observed lower audience appreciation scores for less demanding shows. Comedies can

generally be described as less demanding, so it is not surprising to see that so many of the

inconsistent program commitment scores occur for comedies.

A second group of programs with significantly varying commitment scores are what

might be termed ‘magazine’ shows, that is, shows compiled from segments of smaller

subshows. A good example is ‘20/20’, being a popular documentary having about three to

four stories within the show. Another is ‘Foreign Correspondent’, which is a compilation of

five to ten minute mini-documentaries on recent overseas news items. It appears that

viewers are somewhat selective about which of the stories they watch and may even ‘cherry

pick’, choosing just the stories most interesting to them. This phenomenon may be

13

exacerbated by program promotions, which often highlight only some, but not all, of the

upcoming stories within a magazine type show.

Programs with very consistent commitment scores across episodes include evening

news, dramas with continuing story lines, soap operas and game shows.

Given this consistency across episodes, from now on, when we report a commitment

score for a program it is calculated over the four episodes, as in Table 1. This gives greater

reliability to the scores since there about four times as many viewers in the PMV and P80+

values combined across episodes compared with those for just one episode.

Relationship Between Program Commitment and Program Audience

The relationship between audience appreciation and audience size has been of

interest to researchers and broadcasters since appreciation monitoring first began. This is

because of the need to understand whether the two variables are essentially just the same, or

whether the measurement of audience appreciation can add another, more qualitative,

dimension to media buying and programming policy over and above audience size

considerations.

Using recent BARB data, Windle and Landy (1996) found that there does not

appear to be a linear relationship between audience appreciation scores and audience size.

Their study revealed a correlation of only 0.13 between audience appreciation and size for

one week of prime time shows in the UK. In addition, they found that some high rating

programs achieved quite low audience appreciation scores while the reverse was also true.

Another UK researcher, Meneer (1987, p.241) also claims: “there is (close to) no

relationship between appreciation score and audience size”. In contrast, Barwise and

Ehrenberg (1988, p.52) and Barwise et al (1979) did find a weak but positive relationship

between audience appreciation and size, with high rating programs having greater

14

appreciation. However, they revealed that is it necessary to analyze entertainment and

information programs separately. Windle and Landy (1996) did not do this, which may

explain their low correlation. Carrie (1997) corroborated the significant audience

appreciation and size relationship when scheduling, channel and program type effects are

taken into account rather than just lumping all programs together.

The observation of higher appreciation for higher rating programs is an example of

the ‘double jeopardy’ effect, whereby a show with a large audience not only has more

viewers, but these viewers are more loyal and appreciative, and conversely for low rating

shows (Barwise 1986; Barwise and Ehrenberg 1988, Ehrenberg et al 1990).

In our case, we plotted the two audience commitment scores against program rating

in Figures 2a and 2b. Both graphs show that as program rating increases, commitment to

the show also tends to increase. The correlation between PMV and rating is 0.59 across the

56 programs, while the equivalent correlation for P80+ is 0.6. Both these correlations are

statistically significantly different from 0, indicating a significant positive relationship

between audience commitment and size. Therefore, a double jeopardy effect is clearly

evident, with popular programs not only having more viewers, but more viewers that are

committed to the show.

This relationship is not an inherent artifact of the way we measure commitment,

since commitment is calculated over just viewers, while ratings are calculated over all

people in the people meter panel. For instance, if a panel has 1000 people and 200 of these

people each watch three-quarters of a 30 minute show, then the program rating is

200x(22.5 minutes/30 minutes)/1000=15% (people meter ratings are usually calculated as

the average number of minutes viewed divided by the duration of the show, rather than say

the proportion of people watching more than half the show). The PMV here is simply 75%

15

(since all viewers meet the 20% threshold). If a different 30 minute show has 100 viewers

each of whom watch all the show then the program rating is 100x(30 minutes/30

minutes)/1000=10%, but the PMV is 100%. Here, the lower rating show has a higher PMV,

so it is certainly not automatic that our commitment scores must increase with audience

size. This is evidenced in Table 1, for instance, where ‘Foreign Correspondent’ has an above

average rating of 14, but a below average PMV of 75.

As noted earlier, Barwise et al (1979) showed that the audience appreciation and

size relationship is stronger if programs are split into information and entertainment types.

We calculated correlations between rating and our commitment scores separately for the

information and entertainment programs with the result that for information programs, the

PMV and P80+ correlations with program rating are, respectively, 0.82 and 0.87. For

entertainment programs, the respective correlations are 0.54 and 0.57. This is partial

corroboration of Barwise et al’s (1979) result, since we, at least, observe a much stronger

relationship between audience commitment and size when information programs are

analyzed separately. However, for entertainment programs, the relationship is a little weaker

than for all programs, possibly because the category ‘entertainment’ is too coarse and it

might be better to separate this group into dramas, comedies, soaps, etc., as Carrie (1997)

observed.

Viewer Commitment by Program Type

Barwise and Ehrenberg (1988) and Gunter and Wober (1992) found some variation

in audience appreciation scores by program type. The classification for each of the programs

in this study is given in Table 1. In Table 3 we give average PMV and P80+ values for six

broad program types. It shows that only news shows and soap operas deviate much from

the average, with both program types having somewhat higher than average commitment

16

scores. Game shows are also slightly higher than average (as also observed by Carrie 1997

for appreciation data), but we have only two programs representing this category, one

slightly below average (‘Give Us a Clue’) and the other well above average (‘Sale of the

Century’), so this program type must be treated with caution. The news category also had

only two programs, but both had much higher than average PMV and P80+ values. The

same is true of the soap opera category, where all four programs have higher than average

commitment scores.

Viewer Commitment by Demographics

Television ratings often exhibit variation across demographic factors, especially age

and gender. It would therefore be expected that audience appreciation also varies across

demographic groups. However, Carrie (1997) observed only very slight effects for BARB

appreciation data, with women having slightly higher average appreciation scores than men

(75 versus 71) as well as heavy television viewers being more appreciative than light

viewers. Surprisingly he found only small differences across age groups, with 16-34 year

olds having slightly below average appreciation scores.

Table 4 gives the average PMV and P80+ scores for gender, age and weight of

viewing (measured in terms of membership to a television viewing quintile). Women are

slightly more committed viewers than men, which parallels Carrie’s (1997) finding. Unlike

Carrie (1997), we observe somewhat more variation across age groups, with the most

committed viewers being people aged 40 to 49 and the least committed those aged 9 to 19.

While Table 4 shows the variation in the ‘average program’ commitment score across age

groups is not large, the values for individual programs can vary substantially. For example,

the P80+ scores for ‘Home Improvement’ for the eight age groups in Table 4 are,

respectively, 46, 81, 65, 71, 61, 67 and 59, with the average over all people being 68.

17

Hence, viewers aged 9-19, while being less committed to the ‘average show’, are very

strongly committed to ‘Home Improvement’.

Lastly, the relationship between commitment score and amount of television

watched is not linear. Table 4 shows the average P80+ score for light viewers (quintile 1) is

actually higher than for the next highest quintile and the same as for medium volume

viewers. This may be due to light viewers being more selective in the programs they watch

(appointment viewing) so that when they do watch they are more committed. Other than

this phenomenon for light viewers, commitment scores increase slightly with increasing

weight of viewing, with the heaviest viewers being above average, especially for the P80+

measure.

Viewer Commitment by Program Duration

It might be expected that viewer commitment decreases as program length increases,

simply because it is more time-demanding for people to watch a longer show. Also, a longer

show will have more competition from other shows over its duration. For example, an hour-

long drama may have to compete with two half-hour comedies on another channel. Longer

shows are more susceptible to time-competition from nonviewing activities, such as coffee

making during ads breaks (Danaher 1995).

Nearly all of the programs in this study were 30 or 60 minutes in length. The

average PMV and P80+ scores for the 30 minute programs are, respectively, 0.80 and 0.63,

while for 60 minute programs they are 0.76 and 0.58, respectively. Superficially, therefore,

it might appear that shorter programs do have higher commitment scores. However, aside

from the fact that the differences are not statistically significant, the higher commitment

scores for shorter programs may simply be due to program type effects. For instance, all the

18

news and soap operas (which are shown to have higher commitment scores in Table 3) are

30 minute programs.

As it happened the first episode of ‘Under Suspicion’ was two hours long rather

than the usual 60 minutes, as for the following three episodes. The four PMV values for

‘Under Suspicion’ are 0.70, 0.69, 0.73 and 0.75, for episodes one through four, respectively

(a one-way ANOVA reported above showed no significant difference in these four values -

similarly for the four P80+ values). Hence, the much longer first episode has not

significantly lowered the commitment score. Further verification of this should be done by

examining more programs. Overall, however, on the basis of the programs in this study

there is no significant relationship between program length and commitment.

Summary and Conclusion

We propose two new behavioral measures of program appreciation, based on a

viewer’s commitment to the show. The first is the proportion of the available minutes that a

person views a program, given that they watch at least 20% of the show. The second is the

proportion of viewers watching at least 80% of the show, given that they watch at least

20% of the show.

The proposed behavioral measures of audience appreciation have many

methodological and practical advantages. They:

• demand no additional respondent burden and are unobtrusive

• have no additional cost in countries that operate people meter panels

• do not suffer from respondent recall difficulties, as with diaries

• have more variation than traditional appreciation scores (especially the P80+)

• correlate reasonably well with traditional appreciation measures such as program

enjoyment and frequency of viewing

19

• show differences among program types, viewer demographics and between episodes for

‘magazine’ type shows, in contrast to traditional appreciation measures.

However, some limitations of the proposed behavioral measures are that

• two or more people meter panelists simultaneously ‘logged into’ the same program will

have the same program commitment score, but they do not necessarily have the same

involvement. Some panelists may be watching a program only because others are

watching and, in reality, they are indifferent to the program.

• a person may watch only the second half of a show and yet still be involved, since they

may have missed the first half because they were on the phone, for example.

• programs with only a small number of viewers (low ratings) will have less reliable

commitment scores due to the smaller sample of viewers (this is also a problem for

traditional appreciation scores).

Some further research that could be done to help address the first two limitations

include validating the commitment measures with appreciation scores obtained from the

voting button on the people meter. This can be done, for instance, in the Netherlands and

Austria where voting is routinely conducted by panelists (no such data are available from

the panel used in this study). Secondly, while such a validation exercise is being conducted,

an analysis could be done to see if panelists watched the first or second half of the show and

the voting plus commitment scores can then be compared.

For programs with small ratings, single episode commitment scores may be subject

to high variation. Given that most shows have little episode by episode variation in

commitment, it is advisable to average across several episodes for low rating shows.

20

While the proposed behavioral measures of program involvement are not perfect, it

is apparent that they have a number of advantages over diary-based audience appreciation

scores, particularly their simplicity, low cost and unobtrusiveness. It is certainly very little

effort for a ratings supplier to produce program commitment scores alongside program

ratings, and this can be done at the same time ratings are calculated, meaning that a measure

of program involvement is available the morning after a program is broadcast.

We have not made any specific recommendations about which of the two behavioral

measures to use. Although the correlation between the PMV and P80+ is high, at 0.92, they

are not identical. In particular, the P80+ has much greater variation and is more discerning

across program types and demographic groups. This may make it more attractive to

advertisers and television programmers, especially if a measure of involvement is used to

help with media selection, as proposed by Lloyd and Clancy (1991). Clearly our definitions

of commitment are not unique and fine tuning the threshold for the PMV or the cutoff for

the P80+ are easy to do.

Lastly, this study has raised the question of what it means to “watch” a program.

That is, how much of a program do you have to watch to say that you really have seen it?

From this research it is apparent that not all viewers of some of the show watch all of the

show. For example, only 48 out of 116 panelists that watched at least 20% of ‘X Files’

watched all 60 minutes of the show. Indeed, the average P80+ is only 60%, meaning that,

on average, only 60% of viewers watch more than 80% of the show. The image of a viewer

being ‘glued to the box’, watching all of their favorite program is not true of all viewers.

We showed earlier that one common way to calculate the rating for a program is average

the number of minutes that each person views. Other methods use thresholds, such as

21

watching more than five minutes of the show, or more than half. Clearly, these methods will

not give the same results and a large scale comparison of alternative methods is required.

22

References

Barwise, T.P., (1986), “Repeat Viewing of Prime Time TV Series,” Journal ofAdvertising Research, 26, 4, 9-14.

Barwise, T.P., Ehrenberg, A.S.C. and Goodhardt, G.J. (1979), “Audience Appreciation andAudience Size,” Journal of the Market Research Society, 21, 4, 269-289.

Barwise,T.P. and Ehrenberg, A.S.C. (1987), “The Liking and Viewing of Regular TVSeries,” Journal of Consumer Research, vol.14, 1, p63-70.

Barwise,T.P. and Ehrenberg, A.S.C. (1988), Television and its Audience, London: SagePublications.

Beed T. W. (1992), “An Overview of the Transition from Diary-Based TelevisionAudience Measurement to People Meters in Australia and New Zealand”,Proceedings of the ESOMAR/ARF Worldwide Electronic and BroadcastAudience Research Symposium, Toronto, 139-162.

Carrie, D.G. (1997), “Patterns of Audience Appreciation Ratings for TelevisionProgrammes,” Unpublished doctoral dissertation, London Business School,University of London.

Clancey, M. (1994), “The Television Audience Examined,” Journal of AdvertisingResearch, 34, 4, SS2-SS11.

CONTAM (1988), Committee on Television Audience Measurement.

Danaher, P.J. (1995), “What Happens to Television Ratings During Commercial Breaks?”,Journal of Advertising Research, 35, 1, 37-47.

Danaher, P.J. and Beed, T.W. (1993), “A Coincidental Survey of People Meter Panelists: Comparing What People Say with What They Do,” Journal of AdvertisingResearch, 33, 1, 86-92.

Danaher, P.J. and Rust, R.T. (1994), “Determining the Optimal Level of Media Spending,”Journal of Advertising Research, January, 34, 1, 28-34.

Ehrenberg, A.S.C., Goodhardt, G.J. and Barwise, T.P. (1990), “Double JeopardyRevisited,” Journal of Marketing, 54, 82-91.

Gunter B. and Wober M. (1992), The Reactive Viewer, London: John Libbey & Company.

Kent, R.A. (1994), “Measuring Media Audiences,” London: Routledge.

Lloyd, D.W. and Clancy, K.J. (1991), CPMs Versus CPMIs: Implications for MediaPlanning,” Journal of Advertising Research, 31, Aug/Sep, 34-44.

23

Meneer, P. (1987), “Audience Appreciation - a Different Story from Audience Numbers,”Journal of the Market Research Society, vol 29, 3, p241-264.

Soong, R. (1988), “The Statistical Reliability of People Meter Ratings”, Journal ofAdvertising Research, 28, (Feb/Mar), 50-56.

Tavassoli, N.T., Shultz, C.J.I., and Fitzsimons, G.J. (1995), “Programme Involvement: AreModerate Levels Best for Ad Memory and Attitude Toward the Ad,” Journal ofAdvertising Research, Sept/Oct, 61-72.

Television Audience Assessment (1984), Program Impact and Program Appeal:Qualitative Ratings and Commercial Effectiveness, Boston, MA, T.A.A. Inc.

White, J.B. and M.P. Miles (1996), “The Financial Implications of Advertising as anInvestment”, Journal of Advertising Research, 36, 4, 43-52.

Windle, R. and Landy, L. (1996), “Measuring Audience Reaction in the UK,” WorldwideElectronic and Broadcast Research Symposium, April 1996.

Wulfsberg, R. and Holt, S. (1986), “Television Ready for a Qualitative Ratings System,”Marketing News, vol. 20, 1, 17.

24

Table 1: Commitment Scores, Program Type and Ratings for Each ProgramType PMV* P80+* Rating*

20/20 Information 73 53 93 National News Information 83 63 8American Gladiators Entertainment 74 55 5America's Most Wanted Entertainment 63 63 5Assignment Information 76 55 13Auf Wiedersehen Pet Drama 80 64 12Between the Lines Drama 75 57 11Beyond 2000 Information 76 60 13Billy Entertainment 84 72 18Blue Heelers Drama 86 77 21Booker Drama 75 55 8Canned Carrot Entertainment 77 62 5Chances Drama 81 66 10Counterpoint Information 69 41 5Days of Our Lives Soap 82 66 2El Cid - The Series Drama 77 60 12Family Matters Entertainment 78 58 5Fast Forward Entertainment 64 41 3Fawlty Towers Entertainment 71 51 13FBI - Untold Stories Entertainment 78 60 10Fire Drama 77 60 9Foreign Correspondent Information 75 54 14Frenzy Entertainment 63 38 1Fry and Laurie Entertainment 66 39 5Full House Entertainment 81 65 13Give us a Clue Game 73 43 4Grace Under Fire Entertainment 83 71 11Hangin with Mr Cooper Entertainment 83 70 11Hard Copy Entertainment 72 50 5Holmes Information 87 74 19Home and Away Soap 82 66 6Home Improvement Entertainment 82 68 9Last of the Summer Wine Entertainment 84 72 18Melody Rules Entertainment 72 52 6Murphy Brown Entertainment 70 42 2NYPD Blue Drama 75 60 4NZs Funniest Home Videos Entertainment 80 66 13One Network News Information 89 77 24Poirot Drama 77 60 5Quantum Leap Drama 76 55 8Rescue 911 Entertainment 81 66 17Ricki Lake Show Entertainment 81 62 3Roseanne Entertainment 84 72 10Sale of the Century Game 88 78 14Sally Jessy Raphael Entertainment 81 64 3Shortland St Soap 87 76 18Success Information 85 72 17*Average over the four episodes

25

Table 1 (Cont.)Type PMV* P80+* Rating*

The Bill Drama 86 75 18The Duchess of Duke St Drama 89 79 1The Eastenders Soap 81 67 7The Simpsons Entertainment 74 54 6Totally Hidden Video Entertainment 82 65 11Trainer Drama 76 59 8Under Suspicion Drama 72 50 10Wiseguy Drama 62 29 2X Files Drama 77 59 13Average 78 60 10*Average over the four episodes

26

Table 2: Descriptive Statistics for PMV and P80+

PMV P80+Mean, % 78 60Median, % 78 61Range (min-max), % 61 - 89 29 - 79Standard Deviation, % 7 11

27

Table 3: PMV and P80+ for Program Types

Type PMV P80+Drama 78 60Entertainment 77 60Game Show 81 61Information 77 57News 86 70Soap Opera 83 69

Average across all types 78 60

28

Table 4: Commitment Scores Across Demographics

Demographic PMV P80+Gender

Male 77 58Female 78 60

Age5-8 75 54

9-19 72 5120-29 75 5730-39 76 5840-49 79 6250-59 76 56

60+ 78 60TV Quintile

Light - 1 76 572 75 55

Medium - 3 76 574 78 58

Heavy- 5 80 65

Average over AllPeople

78 60

29

Figure 1a: Distribution of PMV

0

5

10

15

20

25

30

< 6

0

60-6

4.9

65-6

5.9

70-7

4.9

75-7

5.9

80-8

4.9

85-8

9.9

>90

PMV

%

Figure 1b: Distribution of P80+

0

2

4

6

8

10

12

14

16

18

< 25

25 -

29.

9

30 -

34.

9

35 -

39.

9

40 -

44.

9

45 -

49.

9

50 -

54.

9

55 -

59.

9

60 -

64.

9

65 -

69.

9

70 -

74.

9

75 -

79.

9

> 8

0

P80+

%

30

Figure 2a: Plot of Prog ram PMV Against Rating

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25

Rating, %

PM

V, %

Figure 2b: Plot of Prog ram P80+ Against Rating

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25

Rating, %

P80

+, %

Centre for Marketingfacultyresearch.london.edu/docs/97-202.pdfPeter J. Danaher Jennifer M. Lawrie...

Documents

Transcript of Centre for Marketingfacultyresearch.london.edu/docs/97-202.pdfPeter J. Danaher Jennifer M. Lawrie...