End effect manuscript-FINAL-1 - Rady School of Management · the experience. This would result in...

Electronic copy available at: http://ssrn.com/abstract=2498663

Questioning the End Effect 1

Running head: QUESTIONING THE END EFFECT Word count: 11,417

Questioning the End Effect:

Endings Do Not Inherently Have a Disproportionate Impact on Evaluations of Experiences

Stephanie Tully

New York University

Tom Meyvis

New York University

Author Note Stephanie Tully is a doctoral candidate of marketing at the Stern School of Business, New York

University. Tom Meyvis is Professor of Marketing and Peter Drucker Faculty Fellow at the Stern School

of Business, New York University. Correspondence concerning this article should be addressed to

Stephanie Tully, Stern School of Business, 40 W. 4th Street, Ste 822, New York University, New York,

NY 10012. E-mail: [email protected].

Electronic copy available at: http://ssrn.com/abstract=2498663


Abstract

The present research re-examines one of the most basic findings regarding the evaluation

of hedonic experiences: the end effect. The end effect suggests that people’s retrospective

evaluations of an experience are disproportionately influenced by the final moments of the

experience. The findings in this paper indicate that endings are not inherently over-weighted in

retrospective evaluations. That is, episodes do not disproportionately affect the evaluation of an

experience simply because they occur at the end. We replicate prior demonstrations of the end

effect, but provide additional evidence implicating other processes as driving factors of those

findings.

Keywords: retrospective evaluations, end effect, experiences


Questioning the End Effect:

Endings Do Not Inherently Have a Disproportionate Impact on Evaluations of Experiences

“Did you like the concert?” “How much did you enjoy that restaurant?” “How painful

was this medical procedure?” To answer common questions such as these, people need to

evaluate the experiences they live through. Since these evaluations in turn influence people’s

willingness to recommend or repeat an experience (e.g., Wirtz et al., 2003), it is essential to

understand how people form such retrospective evaluations of their past experiences. The current

research re-examines one of the most basic findings in this area: the end effect. The end effect

refers to the fact that people’s retrospective evaluations are disproportionately influenced by the

final moments of the experience (e.g., Kahneman et al., 1993; Fredrickson & Kahneman, 1993).

While there are many prior demonstrations of the end effect, previous research has also

documented several notable boundary conditions. In the current work, we do not explore such

boundary conditions, but instead revisit the basic effect, focusing on a simple, continuous

experience to test if the end of an experience does indeed inherently receive disproportionately

more weight. While we acknowledge that endings can have a disproportionate impact on

retrospective evaluations, our findings suggest that this is not due to an inherent over-emphasis

of the final moments of an experience, but rather because of specific additional properties of the

end in certain settings.

Prior research has proposed that, when retrospectively evaluating an experience, people

do not add or integrate their reactions across the experience, but rather recall the most

representative moments of the experience and then evaluate the experience based on these

selected moments (e.g., Ariely & Carmon, 2000; Kahneman, 2000a; Kahneman, 2000b; Varey &

Kahneman 1992). Furthermore, the most representative moments of an experience tend to consist


of the most extreme moment (the peak) and the final moment (the end). Thus, according to this

evaluation-by-moments principle, the peak and the end of the experience will disproportionately

affect the global evaluation of the experience (Kahneman, 2000a). The over-weighting of the end

of the experience, in particular, has received substantial attention and has led to a variety of

recommendations to restructure experiences to take advantage of this effect, including to

optimize customer experiences (Cusick, 2012; Shaw, Dibeehi, & Walden, 2010), to understand

American’s sentiment about the economy (Surowiecki, 2002), and to improve personal

happiness and well-being (Conniff, 2006).

The end effect has empirically been demonstrated across a variety of domains and using a

variety of procedures (e.g., Kahneman et al., 1993; Fredrickson & Kahneman, 1993; Redelmeier

& Kahneman, 1996; Ariely, 1998). Much of the empirical support for the end effect is based on

the analysis of online (moment-to-moment) ratings of affective experiences. More specifically,

previous research has demonstrated that the online rating of the end of the experience is often a

disproportionately effective predictor of the retrospective evaluation of the entire experience.

This has been shown for a wide range of stimuli including medical procedures (Redelmeier &

Kahneman, 1996), painful pressure from a vise (Ariely, 1998), annoying noises (Ariely &

Zauberman, 2000; Schreiber & Kahneman, 2000), advertisements (Baumgartner, Sujan &

Padgett, 1997), and television shows (Hui, Meyvis, & Assael, 2014).

Additional evidence for the end effect comes from studies documenting the effect of

“adding a better end.” Participants in those studies show an irrational preference for negative

experiences with an additional period of reduced discomfort over the same experience without

the “better” (i.e., less aversive) end. For instance, Schreiber & Kahneman (2000) asked

participants to listen to a series of annoying noises and observed that participants preferred a


longer sound profile with a less intense ending to a shorter sound profile that was identical but

lacked the additional, less aversive ending. As an example, participants preferred a sound profile

that consisted of 8 seconds of noise at 78 decibels followed by 16 seconds of noise at 66 decibels

over a sound profile that consisted only of 8 seconds of noise at 78 decibels. This beneficial

effect of adding a better (less aversive) ending has also been observed with other experiences,

such as submerging one’s hands in ice water (Kahneman et al., 1993), undergoing a colonoscopy

(Redelmeier, Katz, & Kahneman, 2003), and judgments of hypothetical pain profiles (Varey &

Kahneman, 1992).

Finally, the end effect has also received support from studies that systematically

manipulated the order in which different components of the experience were presented, and

generally found that participants reacted most favorably to experiences in which the best part

was positioned at the end. For instance, Ariely and Zauberman (2000) observed that participants

rated an annoying sound profile as more aversive when the most intense sound was positioned at

the end of the profile rather than at the beginning or the middle.

However, in spite of these many demonstrations of the end effect, prior research has also

documented several important boundary conditions. First, the end of an experience does not have

a disproportionate impact when that experience is expected to continue in the future (i.e., when

the end is seen as temporary). For instance, when participants in a social interaction expected to

further interact with the other person in the future, the most recent interaction did not receive

additional weight in the global evaluation of the personal relationship (Fredrickson, 1991).

Similarly, when people evaluated a series of aversive pictures that they had viewed and

anticipated seeing again in the near future, the peak, but not the end dominated the evaluation of

that experience (Branigan et al., 1997). Second, the presence of the end effect also depends on


the type and structure of the experience. Breaking up simple experiences into segments

attenuates the end effect, whereas complex experiences consisting of qualitatively distinct

components often fail to show any end effect at all. For instance, segmenting aversive sounds

into discrete parts reduced the end effect (Ariely & Zauberman, 2000) and no end effect was

observed in evaluations of activities over the course of a day (Miron-Shatz, 2009), evaluations of

vacations (Kemp, Burt, & Furneaux, 2008), or evaluations of meals (Rode, Rozin, & Durlach,

2007). Finally, the end effect does not appear to be a basic evolutionary trait shared with other

animals as it does not extend to food sequence preferences of rhesus macaque monkeys (Xu,

Knight, & Kralik, 2011).

In sum, prior research includes both ample demonstrations of the end effect as well as

many studies documenting boundary conditions. What does this imply for the status of the end

effect? One possibility is that the end does inherently have a disproportionate influence, but

specific conditions can activate other processes that interfere with (or compensate for) the effect.

However, another possibility is that the end does not inherently have a disproportionate impact.

In that case, the prior demonstrations of the end effect may be driven by other mechanisms, with

the boundary conditions merely reflecting the absence of those mechanisms. Closer inspection of

the prior demonstrations of the end effect provides some initial support for this second

possibility. First, consider the prior demonstrations that adding a better end to an aversive

experience improves the overall evaluation of that experience. Although adding a better end does

indeed manipulate the end of that experience, it also reduces the average intensity of the

experience. Therefore, the improvement in the overall evaluation of the experience could be

driven by the change in average intensity, rather than the over-weighting of the end. As such,

these findings are more accurately classified as demonstrations of duration neglect, rather than


demonstrations of an end effect. Second, consider the finding that the online (i.e., moment-to-

moment) rating of the final moments of an experience is a particularly good predictor of the

overall evaluation (relative to ratings of other parts of the experience). While this finding is

consistent with the overweighting of the end, it would also occur in the absence of an end effect

if participants’ online ratings incorporate information from past as well as current moments of

the experience. This would result in the final ratings being more informed than the initial ratings,

and therefore correlating more strongly with the overall evaluation. Moreover, providing explicit

online ratings may artificially enhance the salience of the final rating, leading participants to use

it as an anchor in their global evaluation. Finally, the finding that experiences are evaluated more

favorably when the best part is positioned at the end (rather than elsewhere in the experience) is

based on studies that systematically varied the position of the different components of the

experience within-subjects. Asking participants to evaluate multiple experiences that were

identical in all respects except for the order of the components likely encouraged participants to

rely on that order in their evaluations, as it was the only aspect that varied. As such, this

procedure may have lead participants to rely on their lay beliefs about how experiences should

be optimally structured (e.g., “This was identical to the previous experience, with exception that

the best part now came at the end. Ending on a high note is good, so I like that experience

more.”).

It should be noted that, even if endings do not inherently have a disproportionate impact,

this does not preclude that, under specific circumstances, they can in fact have a greater impact

than other parts of the experience. This would for instance be the case if the structure of the

experience is made salient and consumers rely on their lay beliefs about the desirability of

favorable endings (as may be the case in the within-subject designs mentioned earlier). Similarly,


past research has observed a strong impact of the end of an experience when that end is

particularly meaningful, as is the case with goal-directed experiences (Carmon & Kahneman,

1996), where endings determine whether a goal is met, and television shows (Hui et al., 2014),

where endings serve as meaningful conclusions of a storyline. However, in those cases, the end

does not have a disproportionate impact merely because it is the end, but rather because it has

additional properties that increase its significance relative to the rest of experience.

Overview of the Current Research

In this paper, we focus on the inherent impact of the end, and test whether merely being

the end of an experience is sufficient for disproportionately affecting overall evaluations. To do

so, we do not examine novel situations, nor the complex experiences that have been previously

identified as boundaries of the effect. Instead, we examine the type of basic experience that has

traditionally been used in studies that have provided support for the effect: listening to short

fragments of simple auditory stimuli. In a first study, we observe that aversive sounds with either

a better beginning or a better ending are not rated differently, even when participants clearly

recall the ending as better or worse than the rest of the experience. The remaining studies

reconcile this lack of a discernable end effect with previous demonstrations of the effect. Studies

2 and 3 demonstrate that changing the ending does affect evaluations when the end changes the

experience’s overall average, but not when the average is unaffected. Next, in studies 4 and 5,

which use a repeated measures design, we observe that, while endings are not over-weighted in

evaluations of the first experience, they are over-weighted in evaluations of a subsequent

experience. That is, moving a distinct part of the experience to the end versus the beginning of an

experience only affects people’s evaluations when they can readily observe this shift (being the

only difference between both experiences). Finally, in a field study (Study 6), we examine the


relationship between the overall evaluation of an experience and ratings of distinct components

of the experience—and fail to observe any increased impact of the final rating.

Study 1: A Better Beginning versus a Better End

In this first study, we re-examine the end effect using a simple stimulus (an aversive

noise), which is unlike the complex stimuli from the boundary condition studies, but similar to

the stimuli used in the classic demonstrations of the effect (e.g., Ariely & Zauberman, 1999;

Schreiber & Kahneman, 2000). Our study did, however, differ from those demonstrations in that

it systematically manipulated the structure of the experience, both between participants and

without changing the average intensity of the experience. To achieve this, we presented

participants with one of two sound profiles, which were the inverse of each other, so that they

were identical in total volume but one sound clip began loudly and ended quietly, whereas the

other began quietly and ended loudly. Since the sound was an aversive noise, this implies that

some participants experienced a better (less aversive) ending, whereas others experienced a

worse ending. If the end of the experience has a greater impact on global evaluations than other

parts of the experience, then the sound clip with the better ending should be rated as less aversive

than the sound clip with the worse ending.

Method

Three hundred and three Mechanical Turk participants completed the study online in

exchange for monetary compensation. Participants were told they would be listening to a few

short irritating sounds. They were asked to listen to the sounds using their headphones, and told

that they would need to identify the sounds they heard later in the experiment (to ensure that

participants indeed listened to the sounds). Participants first listened to a sound clip of a dot

matrix printer and were asked to use this sound to calibrate the volume of their headphones.


Next, all participants listened to a short drill sound, and indicated how annoying the sound was

on a 9-point scale (1 = not annoying at all, 9 = very annoying). This measure was included to be

used as a covariate in the analyses and thus reduce error variance due to differences across

participants in headphone volume or in their general aversion to annoying sounds.

Participants then listened to one of two sound clips, depending on condition. Both clips

consisted of 24 seconds of vacuum cleaner sound. One clip (Better End condition) started at a

high volume which was sustained for 6 seconds, after which it gradually reduced in volume for

the remaining 18 seconds, resulting in a relatively quiet (i.e., less aversive) ending. The other clip

(Worse End condition) was identical, but reversed in time. That is, it started quietly and

increased in volume for the next 18 seconds, ending with 6 seconds of sustained high volume

noise. See Figure 1 for a visual depiction of the sound profiles.

Figure 1. Visual depiction of sound profiles used in Studies 1 and 4. The height of the waveform represents the volume of the sound. Time is represented on the horizontal axis in seconds. Better End:

Worse End:

After listening to the sound clip, participants rated how annoying, unpleasant, and

irritating it was to listen to the clip (all on 9-point scales anchored by: 1 = not at all, 9 = very).


Further, to ensure that participants in the different conditions indeed noticed the difference in the

volume of the ending, participants were asked to indicate how the end of the sound clip

compared to the rest of the sound clip (9-point scale: -4 = end was much worse, 4 = end was

much better). Next, to verify that participants indeed listened to the sound clip, they were asked

to select the sound they listened to from three options (an ambulance, a car alarm, and a

vacuum). We then asked participants whether they had adjusted the volume of their headphones

at any point while listening to the sound clips. Finally, we collected demographic information.

Results

Six people failed to correctly recognize the sound clip and are thus excluded from the

analysis, leaving a sample of 297 participants (MAge = 32.3, SD = 11.12; 63.3% male).

Manipulation check. To verify that the manipulation of the ending was successful, we

first analyzed participants’ perception of how the end compared to the rest of the clip. As

intended, participants in the Better End condition rated the end of the clip as relatively better (M

= 1.93, SD = 1.64) than did participants in the Worse End condition (M = -2.47, SD = 1.94), F(1,

295) = 273.67, p < .001, ηp2 = 0.481.

Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were

standardized and combined to form an aversiveness index (α = .94). To test the end effect, we

then analyzed this index—while adjusting for the covariate (i.e., the aversiveness of the drill

sound) to increase the power of this test. If the final moments of an experience indeed have an

inherently disproportionate impact, participants in the Better End condition should rate their

listening experience as less aversive than those in the Worse End condition. However, the two


conditions did not substantially differ in the perceived aversiveness of the experience (M Better End

= 6.69, SD = 1.71; M Worse End = 6.96, SD = 1.81), F < 1, ηp2 = .0021.

Discussion

Study 1 tested the end effect using a simple stimulus that was unlike the stimuli in the

boundary condition studies (e.g., complex experiences, experiences that are expected to

continue), but similar to the stimuli used in prior demonstrations of the end effect. Yet, in spite of

this, we did not observe an end effect: placing the better (less aversive) part of the sound at the

beginning versus the end did not substantially affect participants’ evaluation of the experience.

This null effect is quite informative given the large sample size and given that participants

readily recalled the ending as better or worse (consistent with the manipulation). In the following

studies, we will provide additional tests of the end effect, as well as attempt to reconcile previous

demonstrations with the absence of an effect in this study.

Study 2: A Better Average versus a Better End

In the second study, we revisit previous demonstrations that extending an aversive

experience with a less aversive (but still negative) ending tends to improve the overall evaluation

of the experience. Although this finding is consistent with an over-weighting of the end of the

experience, it could also be due to a decrease in the average intensity of the experience. To

distinguish between these two accounts, we exposed participants to one of three sound clips of an

irritating noise: (1) a clip with a softer (and thus better) middle section (Better Middle), (2) a clip

with a softer ending (Better End), or (3) a clip with a softer middle section and an additional

softer ending (Added End). The Better Middle and Better End clips had an identical average

volume and only differed in the timing of the softer section. The Added End clip consisted of the

Better Middle clip with an additional, softer extension of the noise. 1 The means adjusted for the covariate: M Better End = 6.70, M Worse End = 6.97.


Thus, the Better Middle and Better End clips differed in the aversiveness of the ending,

but not in the average intensity of the experience, whereas the Added End clip differed from both

other clips in the average intensity of the experience. If the end effect holds and endings are

inherently over-weighted, then the Better End and Added End experiences should both be

perceived as less aversive than the Better Middle experience. However, if adding a less aversive

ending improves evaluations because it reduces the average intensity of the experience (and not

because endings are over-weighted), then the Added End experience should be perceived as less

aversive than both the Better Middle and Better End experiences, and there should be no

difference between the perceived aversiveness in the Better Middle and Better End conditions.

Method

Two hundred and sixty undergraduate students participated in the study for either partial

course credit or monetary compensation.

Participants were seated at a desktop computer and asked to wear headphones, the

volume of which was fixed and approximately equal across computers. All participants first

listened to a short drill sound, and rated their irritation with the sound on a 101-point sliding

scale (0 = not at all irritating, 100 = very irritating). As in Study 1, this measure was included to

be used as a covariate in the analyses and thus reduce error variance due to individual differences

in aversion to annoying sounds. Next, participants completed a short filler task before continuing

with the main study.

Participants were then asked to listen to the sound of a vacuum cleaner. They listened to

one of three sound profiles, depending on condition. All three sound profiles consisted of a

vacuum noise that fluctuated in volume between low and moderately high. In the Better Middle

condition, the clip contained a 30-second low-volume segment in the middle of the clip. Both of


the other conditions were based on the Better Middle condition, but in the Better End condition,

the low-volume segment was moved to the end of the clip (instead of the middle), and in the

Added End condition, an additional 30-second low-volume segment was added to the end of the

experience (together with a 5-second transition, resulting in a total clip time of 170 seconds).

Thus, the sound clips in the Better Middle and Better End conditions differed in ending, but not

in average volume, whereas the sound clip in the Added End condition differed in average

volume from the clips in both other conditions. See Figure 2 for a visual depiction of the sound

profiles.

Figure 2. Visual depiction of sound profiles used in Study 2. The height of the waveform represents the volume of the sound. Time is represented on the horizontal axis in seconds. Better End:

Better Middle:

Added End:

After participants listened to the clip, they rated the extent to which they found the

experience of listening to the sound annoying (9-point scale: 1 = mildly annoying, 7 = extremely

annoying), unpleasant (9-point scale: 1 = mildly unpleasant, 7 = very unpleasant), or irritating


(measured on the same scale as the covariate: a 101-point slider scale anchored by: 0 = mildly

irritating, 100 = extremely irritating).

After the primary dependent measures were collected, participants were asked to again

listen to the drill sound that they listened to at the start of the study, and then indicated whether

this experience was more or less irritating than listening to the vacuum sound (9-point scale, 1 =

much less irritating, 9 = much more irritating). Participants then rated the volume of the vacuum

sound (1 = very quiet, 9 = very loud). Next, participants indicated how much money, out of $10,

they would give back to avoid repeating the experience, and how long (in seconds) they believed

the experience lasted. These four additional measures were included to test whether, if the end

effect would again not obtain in scale measures of the subjective experience, it might instead

manifest in alternative measures: a relative preference measure (which avoids scaling effects), an

evaluation of the objective experience (volume), valuation, or a downstream effect (on time

perception).

To verify that participants had noted the volume at the end of the clip, they were asked to

indicate how the end of the experience compared to the rest of the experience (by selecting one

of three options: the end was quieter, the end was about the same, the end was louder).

Finally, participants provided demographic information and completed an Instructional

Manipulation Check (Oppenheimer, Meyvis, & Davidenko, 2009), which consisted of a

paragraph of text explaining the importance of reading instructions and asking participants to

choose “none of the above” from a marital status dropdown list.

Results

Thirty-five people failed the Instructional Manipulation Check, leaving a sample of 224

participants (MAge = 20.2, SD = 2.17; 44.2% male).


Manipulation check. Participants were more likely to indicate that the end was quieter

than the rest of the sound clip in the Better End condition (P = 60.1%) than in the Better Middle

condition (P = 31.5%), χ2 (1) = 12.82, p < .001, indicating that the manipulation of the ending

was successful. Participants in the Added End condition were also more likely to indicate that the

end was quieter (P = 45.3%) than were participants in the Better Middle condition, but this effect

was only marginally significant, χ2 (1) = 2.95, p = .086 (possibly because this clip was longer and

therefore the perception of the end extended beyond the final low-volume segment).


standardized and combined to form an aversiveness index (α = .93). As in Study 1, we analyzed

this index while controlling for the aversiveness covariate (the rating of the drill sound at the

start of the study) to increase the power of the tests. First, we tested the end effect by comparing

the Better Middle and Better End conditions, which differed in ending, but not in average

intensity. A planned contrast showed that the experience was not perceived as less aversive in the

Better Middle condition (M = 0.00, SD = 0.92) than in the Better End condition (M = 0.11, SD =

0.96), F < 1, ηp2 < 0.001. Thus, as in Study 1, we again did not observe an end effect. Next, we

tested whether adding a better end (rather than moving the better part to the end) changes the

perceived aversiveness of the experience, by comparing the Added End condition to the other

two conditions, both of which had a higher average intensity. A planned contrast confirmed that

the experience was perceived as less aversive in the Added End condition (M = -0.10, SD = 0.94)

than in the other two conditions, F(1, 220) = 4.43, p = .036, ηp2 = 0.020.2 Thus, while we again

did not replicate the end effect, we did replicate the prior finding that adding a less aversive

2 The means adjusted for the covariate: M Added End = -0.16, M Better End = 0.06, and M Better Middle = 0.10.


ending to a negative experience reduces the overall aversiveness of the experience (in spite of

adding negative utility).

Other measures. Similar to the aversiveness index, the additional measures did not show

any difference between the Better Middle and Better End conditions. The position of the low-

volume segment did not affect the relative preference over the drill sound (F < 1), perceived

volume (F < 1), willingness to pay to avoid the experience (F < 1), or the perceived duration of

the clip (F(2, 220) = 1.63, NS).

The contrast comparing the Added End condition to the other two conditions also did not

show any reliable difference for relative preference over the drill sound (F(1, 220) = 1.18, NS) or

perceived volume (F(1, 220) = 1.88, NS). However, participants in the Added End condition

were willing to pay less to avoid the experience (M = $0.78, SD = 1.92) than were participants in

the Better End condition (M = $1.56, SD = 2.52) or Better Middle condition (M = $1.60, SD =

3.37), F(1, 220) = 5.25, p = .023, ηp2 = 0.023, consistent with the earlier finding that adding a

better end reduced the perceived aversiveness of the experience. Finally, participants in the

Added End condition also provided higher estimates of clip duration (M = 155 secs, SD = 90.71)

than those in the Better End condition (M = 116 secs, SD = 76.30) or the Better Middle condition

(M = 101.09, SD = 50.56), F(1, 220) = 18.56, p < .001, ηp2 = 0.078, which was consistent with

the actual longer duration of the clip in that condition.

Discussion

Moving the less aversive part of an irritating noise to the end versus the middle did not

affect the perceived aversiveness of the experience, casting further doubt on the existence of an

inherent end effect. However, extending the irritating noise with an additional, less aversive part

did lead participants to perceive the overall experience as less aversive. Thus, Study 2 replicates


prior findings of the beneficial effects of “adding a better end,” but also indicates that this effect

is driven by a lowering of the average rather than a disproportionate impact of the end. In the

next study, we conceptually replicate this finding in the positive domain.

Study 3: Adding a Worse Middle versus a Worse End

In Study 2, we examined the effect of adding a less aversive (i.e., better) segment to an

aversive experience. In this next study, we examine the effect of adding a less enjoyable (i.e.,

worse) segment to an enjoyable experience: listening to pleasant music clips. Furthermore,

unlike in Study 2, we now vary whether the segment is added to the middle of the experience or

to the end of the experience. In other words, we compare the effect of adding a worse middle to

the effect of adding a worse end. If evaluations of the experience are disproportionately based on

the end of the experience, then adding a worse end should have a greater (negative) effect than

adding a worse middle. However, if evaluations are based on the average intensity of the

experience rather than the end of the experience, then adding a worse segment should have a

similar (negative) effect, regardless of whether it is added to the end or to the middle.

Method

Pretest. For the main study, we constructed three different music clips: one music clip

consisting of four enjoyable pieces of instrumental music (for the control condition) and two

music clips consisting of those same four enjoyable pieces of music and one additional, less

enjoyable piece of instrumental music (either inserted in the middle or at the end). To select

these music fragments, we first pretested a wide range of instrumental music fragments using a

sample of 121 participants drawn from the same population as used for the main study

(Mechanical Turk). Each participant listened to a selection of 10 30-second clips of instrumental

music (out of a total set of 30 clips) and rated each clip on a 9-point scale. Based on this pretest,


we selected four clips that were enjoyed by most participants, namely 30-second fragments from

“Herd Reunion” (from the Ice Age: Continental Drift Soundtrack, M = 6.84, SD = 1.91), “Heart

Song” (performed by Gosha Mataradze, M = 6.29, SD = 2.09), Bach’s “Goldberg Variations” (M

= 6.38, 1.55), and Mozart’s “Rondo Alla Turca” (M = 6.05, SD = 2.03). We also selected one

sound clip that was significantly less enjoyable than each of the four other clips: “Reanimator”

(performed by Amon Tobin, M = 4.71, SD = 2.12), all t’s(79) > 2.92, p’s < .002. To further

ensure that this last clip was clearly less enjoyable than the others, we increased repetitiveness by

expanding it to 45 seconds and also applied a minor change in pitch shift.

Main study. Nine hundred and twelve Mechanical Turk participants completed the study

online in exchange for monetary compensation.

Analogous to the previous studies, we first obtained a measure of participants’ propensity

to like instrumental music, to be used as a covariate in the analysis (and thus increase the power

of our tests). Specifically, participants first listened to a 10-second instrumental music clip (a

segment from “On the Right Track,” performed by Zhanna Hamilton) and indicated how much

they enjoyed listening to the clip on a 9-point scale (1 = not at all, 9 = very much). Participants

were then told that they would next listen to a music compilation. They were reminded that the

study was on the enjoyment of music and to simply sit back, relax, and listen to the music

compilation which would be the length of a short song. Participants then heard one of three

music clips, depending on condition. In the Control condition, the music clip consisted of the

four enjoyable 30-second fragments identified in the pretest. The fragments were combined into

one 148-second clip by gradually phasing out of one fragment and into the next (thus preserving

the unity of the experience). In the two experimental conditions, the clips consisted of the

Control condition clip with the addition of the less enjoyable fragment identified in the pretest.


This fragment was either inserted in the middle of the clip (Worse Middle condition) or at the

end of the clip (Worse End condition). The order of the four enjoyable fragments was

counterbalanced. Next, participants indicated how much they enjoyed listening to the clip on the

same 9-point scale as used for the covariate measure.

As manipulation checks, participants were asked to indicate how the middle compared to

the rest of the clip (-4 = middle was much worse, 4 = middle was much better) and how the end

compared to the rest of the clip (-4 = end was much worse, 4 = end was much better).

Participants then listened to a 10-second version of the less enjoyable fragment and were asked

to categorize this fragment as either pleasant, neither pleasant nor unpleasant, or unpleasant.

Finally, to verify that participants had indeed listened to the music compilation, we asked them

to listen to three short music fragments and to identify which one of these fragments had been

played as part of the music compilation.

Results

Twenty-eight people failed to recognize the fragment used in the compilation and are

thus excluded from all analyses, leaving 884 participants (MAge = 29.4, SD = 9.64; 65.3% male).

Manipulation checks. The majority of participants rated the less enjoyable fragment as

either unpleasant (35.7%) or neither pleasant nor unpleasant (35.4%), confirming that this

fragment was not particularly enjoyable, as intended. More important, participants in the

experimental conditions reported that the middle section (or the end section, depending on where

this less enjoyable fragment was placed) was indeed relatively less enjoyable, indicating that the

manipulation was successful. Specifically, participants in the Worse End condition rated the end

as worse than the rest of the compilation (M = -1.27, SD = 2.20), compared to participants in the

Worse Middle condition (M = 1.22, SD = 2.06), F(1, 880) = 197.11, p < .001, and those in the


Control condition (M = 0.79, SD = 1.99), F(1, 880) = 125.13, p < .001. In addition, participants

in the Worse Middle condition rated the middle as worse than the rest of the compilation (M = -

0.43, SD = 2.09), compared to participants in the Worse End condition (M = 1.02, SD = 1.96),

F(1, 880) = 79.78, p < .001, and those in the Control condition (M = 0.50, SD = 1.99), F(1, 880)

= 30.54, p < .001.

Enjoyment of the experience. Similar to the previous studies, the analysis of the

enjoyment measure was adjusted for the covariate (i.e., the enjoyment of the clip at the start of

the study) to increase the power of the test. However, once again, we did not obtain any evidence

of an end effect. Participants did not enjoy the clip less when the worse fragment was placed at

the end of the clip (M = 6.50, SD = 1.65) rather than in the middle of the clip (M = 6.58, SD =

1.69), F < 1, ηp2 = 0.001. However, adding the worse fragment (regardless of its position) did

decrease the enjoyment of the clip relative to the Control condition (M = 6.76, SD = 1.56), F(1,

880) = 4.37, p = .037, ηp2 = 0.005.3 This conceptually replicates the results of Study 2 and

suggests that participants’ enjoyment relied on the average of the experience rather than the

ending.

Discussion

The results of Study 3 replicate those of Study 2 in the positive domain, and provide

additional evidence that the effect of adding a less intense ending on the overall evaluation is

driven by changes in average intensity, rather than over-weighting of the end. Adding a less

enjoyable music fragment reduced overall enjoyment of the music compilation, but it did not

matter whether this fragment was inserted at the end or in the middle of the experience. The fact

that the positioning of the less enjoyable fragment did not affect overall evaluations provided

particularly compelling evidence against the existence of a substantial, inherent end effect, given 3 The means adjusted for the covariate: M Control = 6.76, M Worse End = 6.51, M Worse Middle = 6.58.


that (1) the manipulation check showed that participants in the respective conditions could

clearly identify the end (or middle) as less enjoyable than the rest of the experience, (2) adding

the less enjoyable fragment did affect overall evaluations, and (3) the test of the end effect in this

study was particularly powerful given the large sample size and the use of a highly correlated

covariate (r = .43). In fact, the procedure of this study allowed for the detection of a small effect

(Cohen’s f 2 = 0.01) with a probability of 90.8%.

In other words, studies 2 and 3 suggest that the often documented “adding a better end”

effect should be interpreted solely as a demonstration of duration neglect, rather than evidence

for the over-weighting of the end. However, the end effect has also received support from studies

using other paradigms. In particular, studies that have systematically manipulated the structure of

experiences have provided more direct evidence of the end effect (e.g., Ariely, 1998; Ariely &

Zauberman, 2000). These studies have commonly found that experiences with a better (or less

aversive) ending are evaluated more favorably (or less negatively) than experiences with a better

beginning or a better middle, even when the average intensity is held constant. However, as

mentioned earlier, these studies tend to employ within-subject designs, which expose each

participant to anywhere between 8 and 64 different experiences. Since these experiences are

identical except for their structure (e.g., a loud noise that ends softly versus the same noise that

starts softly), this structure would be particularly salient for participants, who may infer that they

need to use this structure in their evaluations. As such, end effects observed in these studies may

reflect people’s lay beliefs that it is better to end on a high note (or preferences for improving

sequences, Loewenstein & Prelec, 1993), rather than a spontaneous reaction to an experience that

ends well versus poorly. Even if participants are not relying on lay beliefs, the increased salience

of experience structure in within-subject designs could still be a requirement for end effects to


manifest (suggesting that the end is not inherently over-weighted). The last two studies were

designed to test whether endings are indeed over-weighted in the context of repeated

experiences, but not when experiences are judged in isolation.

Study 4: Single versus Repeated Negative Experiences

To examine whether exposure to repeated experiences (with variations in structure) leads

people to increase their evaluations of experiences that end well, we used a repeated measures

design. Specifically, participants were asked to listen to two aversive sounds that were identical,

but reversed in sequence, such that one sound ended well and one sound ended poorly. The order

of the sounds was manipulated between conditions. Consistent with the lack of an end effect in

the first studies, we expected that the difference in structure would not affect participants’ rating

of the first sound they heard: participants will rate the experience as equally aversive, regardless

of whether they were assigned to a noise that ends well or to a noise that ends poorly. However,

consistent with prior demonstrations of end effects in within-subject designs, we expected that

the difference in structure would affect the rating of the second sound: after listening to a noise

that ends well, participants will rate a noise that ends poorly as more aversive (and vice versa).

Method

Two hundred and four Mechanical Turk participants completed the study online in

exchange for monetary compensation.

The procedure was similar to that of Study 1. All participants first listened to the printer

sound clip (to calibrate the volume) and then rated their irritation with the drill sound clip (to be

used as covariate). Participants then listened to one of two versions of the main stimulus: 24

seconds of vacuum cleaner noise. As in Study 1, the two sound clips were identical but reversed,

so that one clip started with 6 seconds of high volume noise, followed by 18 seconds that


gradually tapered off in volume (Better End), whereas the other clip started quietly and ended at

a high volume (Worse End). See Figure 1for a visual depiction of the sound profiles.

Immediately after listening to the sound clip, participants rated how annoying, unpleasant, and

irritating it was to listen to the clip (on 9-point scales: 1 = not at all, 9 = very).

Unlike in Study 1, participants next listened to the other clip (i.e., those who listened to

the Better End clip then listened to the Worse End clip and vice versa), and rated that clip as

well. After rating the second sound clip, participants were asked to indicate which of the two

sound clips they would choose if they had to listen to one of the clips again (9-point scale: -4 = I

would definitely choose the first clip, 0 = No preference, 4 = I would definitely choose the

second clip). As a manipulation check, we next asked participants, for each clip, how the end of

the experience compared to the rest of the experience (9-point scale: -4 = End was much worse, 4

= End was much better). To verify that participants had indeed listened to the clips, we then

asked them to select the sound they listened to from three options: an ambulance, a car alarm,

and a vacuum. Finally, we asked participants whether they had adjusted the volume of their

headphones at any point, before collecting demographic information.

Results

Three people failed to recognize the sound clip used in the study and are thus excluded

from the analysis, leaving a sample of 201 participants (MAge = 32.8, SD = 11.12; 63.7% male).

Manipulation checks. The end of the first clip was rated as significantly better by

participants who listened to the Better End clip first (M = 6.73, SD = 1.92) than by participants

who listened to the Worse End clip first (M = 3.15, SD = 2.15), F(1, 199) = 155.06, p < .001.

Similarly, the end of the second clip was rated as significantly better by participants who listened

to the Better End clip second (M = 7.16, SD = 2.02) than by participants who listened to the


Worse End clip second (M = 3.17, SD = 2.06), F(1, 199) = 191.86, p < .001. These results

confirm that the manipulation of the structure of the experience was successful.


standardized and combined to form an aversiveness index for each sound clip (α clip 1 = .95, α clip 2

= .96). As in the previous studies, the between-subjects analysis of this index was adjusted for

the covariate (the irritation with the drill sound) to increase the power of those tests. Consistent

with prior demonstrations of the end effect, the within-subject analysis showed that participants

perceived their Better End experience as less aversive than their Worse End experience, F(1,

199) = 73.45, p < .001, ηp2 = 0.270. Although this effect is quite sizeable, the between-subjects

analysis (i.e., the comparison of the two order conditions) adds an important nuance to the

interpretation of this effect. Indeed, consistent with our previous studies, the perceived

aversiveness of the first clip did not differ between participants who listened to the Better End

clip first (M = 6.56, SD = 1.90) and those who listened to the Worse End clip first (M = 6.69, SD

= 1.82), F < 1, ηp2 < .001. It was only for the second clip that participants who listened to the

Better End clip reported less aversiveness (M = 5.88, SD = 1.98) than those who listened to the

Worse End clip (M = 7.42, SD = 1.60), F(1, 198) = 53.44, p < .001, ηp2 = .2124. These results are

graphically depicted in Figure 3. Thus, although the within-subject analysis replicated previous

demonstrations of the end effect, the effect once again failed to obtain when participants

evaluated a single experience.

Figure 3. Perceived Aversiveness of Sound Clips by Condition (Study 4)

4 The means adjusted for the covariate, Clip 1: M Worse End = 6.59, M Better End = 6.66, Clip 2: M Worse End = 7.45, M Better

End = 5.85.


Note: Error bars denote standard errors.

Preference. Whether participants preferred to repeat the first or the second clip depended

on which clip they listened to first, F(198) = 140.26, p < .001. Participants who listened to the

Better End clip followed by the Worse End clip preferred listening to the first clip again (M = -

3.36, SD = 2.59), t(98) = -2.44, p = .016, whereas participants who listened to the clips in the

opposite order preferred listening to the second clip again (M = 3.36, SD = 2.59), t(101) = 14.08,

p < .001. These results indicate that participants consciously preferred a noise that ended well

over an equivalent noise that started well.

Discussion

At first glance, the results of this study provide strong evidence for an end effect,

consistent with prior research. When participants listened to a noise that started loudly but ended

better (Better End) and one that started better but ended loudly (Worse End), they rated the noise

that ended better as less aversive, and strongly preferred that noise to the one with a worse

ending. Yet, the between-subjects analysis reveals that this advantage for the better ending

experience only emerges after participants have been exposed to multiple experiences (that are

4

5

6

7

8

First Sound Clip Second Sound Clip

Ave

rsiv

enes

s (9

-poi

nt

scal

e)

Worse End Better End


identical to each other except for their structure). Indeed, for the first sound clip, the end effect is

as conspicuously absent in this study as it was in the previous studies: participants rated the first

clip as equally aversive, regardless of whether it ended well or poorly. The end effect only

emerged for the second sound clip, which was identical to the first sound clip except for its

structure. Thus, in this study, the end effect only emerged when the repetition of experiences

made the structure of the experience salient, indicating that although people do not

spontaneously over-weight the end of an experience, they may do so when they are encouraged

to base their evaluations on differences in structure. Note that it is not sufficient that participants

note that the end of the experience is clearly better or worse than the rest of the experience (as

our manipulation checks indicate this is generally the case in our studies), but rather that

differences in structure are made salient as a criterion for evaluation.

Study 5: Single versus Repeated Positive Experiences

Study 5 aimed to conceptually replicate Study 4 with positive stimuli. We used pleasant

music compilations that varied in the position of a less enjoyable segment (as in Study 3) and

presented all participants with both versions, in counterbalanced order (as in Study 4). Similar to

the results of Study 4, we expected that the position of the less enjoyable segment would not

affect participants’ rating of the first music compilation, but would affect the rating of the second

compilation: after listening to a music compilation with a mediocre middle (ending), participants

will rate a clip with a mediocre ending (middle) as less (more) enjoyable.

Method

Five hundred and two Mechanical Turk participants completed the study online in

exchange for monetary compensation.


As in Study 3, participants first listened to a 10-second instrumental music clip ( “On the

Right Track”) and rated their enjoyment on a 9-point scale (1 = not at all, 9 = very much), to be

used as a covariate in the analysis. Next, participants read that they would listen to two music

compilations. Both music compilations were composed of three of the five fragments used in

Study 3: two of the very enjoyable fragments (“Herd Reunion” and “Heart Song”) as well as the

less enjoyable fragment (“Reanimator”). The fragments lasted thirty seconds each and were

tapered and integrated to create a more continuous experience, resulting in a music compilation

of 80 seconds. The two compilations differed only in the position of the less enjoyable fragment:

it was either positioned in the middle (Worse Middle) or at the end (Worse End). The order in

which participants heard each compilation was counterbalanced: half of participants heard the

Worse Middle clip first, while the other half heard the Worse End clip first.

After each compilation, participants were asked to indicate how enjoyable and pleasant it

was to listen to the music, both on 9-point scales (1 = not at all, 9 = very much). Similar to Study

2, we also added a relative preference measure after the primary measures: participants were

asked to indicate how much they enjoyed listening to the experience relative to listening to music

on the radio (9-point scale: -4 = much less than listening to the radio, 4 = much more than

listening to the radio). After participants completed these measures for the second music

compilation, they were asked to indicate which of the two music experiences they enjoyed more

(9-point scale: -4 = definitely the first experience, 4 = definitely the second experience).

As a manipulation check, we next asked participants to indicate, for each music

compilation, how the middle compared to the rest of the compilation (9-point scale: -4 = middle

was much worse, 4 = middle was much better) and how the end compared to the rest of the

compilation (9-point scale: -4 = end was much worse, 4 = end was much better). Participants


then listened to a 10-second version of the less enjoyable fragment and were asked to categorize

this fragment as either pleasant, neither pleasant nor unpleasant, or unpleasant. Finally, to verify

that participants had indeed listened to the music compilation, we asked them to listen to three

short music fragments and to identify the fragment that was part of the music compilation.

Results

Twelve people failed to recognize the song used in the compilation and are thus excluded

from all analyses, leaving 490 participants (MAge = 21.8, SD = 10.3; 57.8% male).

Manipulation checks. The majority of participants rated the less enjoyable fragment as

either unpleasant (66.3%) or neither pleasant nor unpleasant (22.2%), indicating that it was not

particularly enjoyable. More important, the manipulation of the placement of this fragment

within the music clip had the intended effect on participants’ perceptions. This was true for

ratings of the first music clip: participants who listened to the Worse Middle clip first rated the

middle of the clip as worse and the end of the clip as better (MMiddle = -1.37, SD = 2.36; MEnd =

1.39, SD = 2.09) than did participants who listened to the Worse End clip first (MMiddle = 0.81,

SD = 2.32; MEnd = -1.00, SD = 2.53), FMiddle(1, 488) = 106.89, p < .001; FEnd(1, 488) = 129.57, p

< .001. This was also true for ratings of the second music clip: participants who listened to the

Worse Middle clip second rated the middle of the clip as worse and the end of the clip as better

(MMiddle = -1.20, SD = 2.44; MEnd = 1.87, SD = 1.93) than did participants who listened to the

Worse End clip second (MMiddle = 1.41, SD = 1.97; MEnd = -1.41, SD = 2.49), FMiddle(1, 488) =

170.02, p < .001; FEnd(1, 488) = 265.86, p < .001. In short, the manipulation was successful:

participants perceived the middle of the Worse Middle clip and the end of the Worse End clip as

relatively less enjoyable.


Enjoyment. The measures of enjoyment and pleasantness were averaged to form an

enjoyment index (α clip 1 = .95, α clip 2 = .94). The between-subjects analysis of this index was

again adjusted for the covariate (the enjoyment of the clip at the start of the study) to increase the

power of those tests. As in Study 4, the within-subject analysis of this index is consistent with

prior demonstrations of the end effect: participants rated their Worse End experience as less

enjoyable than their Worse Middle experience, F(1, 488) = 15.52, p < .001, ηp2 = 0.031.

However, the between-subjects analysis again adds an important nuance to the interpretation of

this result. Consistent with the absence of an end effect in our prior studies, the enjoyment of the

first music clip did not differ between participants who listened to the Worse End clip (M = 6.28,

SD = 1.61) and those who listened to the Worse Middle clip (M = 6.20, SD = 1.64), F < 1, ηp2 <

0.001. Mirroring the results of Study 4, it was only for the second music clip that participants

who listened to the Worse End clip rated their experience as less enjoyable (M = 5.97, SD =

1.63) than participants who listened to the Worse Middle clip (M = 6.27, SD = 1.61), F(1, 487) =

5.10, p = .024, ηp2 = 0.010.5 These results are graphically depicted in Figure 4.

5 The means adjusted for the covariate, Clip 1: M Worse End = 6.21, M Worse Middle = 6.27, Clip 2: M Worse End = 5.96, M Worse Middle = 6.27.


Figure 4. Enjoyment of Sound Clips by Condition (Study 5)

Note: Error bars denote standard errors.

Other measures. We next analyzed participants’ relative preference between listening to

the clip and listening to a song on the radio. Consistent with the enjoyment index (and with prior

demonstrations of the end effect), a within-subjects analysis of this measure indicated that

participants’ showed a greater preference for listening to the music clip (rather than the radio)

when rating the Worse Middle clip (M = 5.48, SD = 2.17) rather than the Worse End clip (M =

5.35, SD = 2.16), F(1, 488) = 8.169, p = .004, ηp2 = 0.016. However, the between-subjects

analysis of these relative preference ratings did not show any reliable difference between people

who listened to the Worse Middle clip and those who listened to the Worse End clip, neither for

the first clip, F(1, 486) = 2.18, NS, nor for the second clip, F < 1. Thus, similar to the analysis of

the enjoyment index, we did not observe an end effect for the first clip, but unlike for the

enjoyment index, we also did not observe an end effect for the second clip, suggesting that this

particular measure may not be sufficiently sensitive to provide a strong test of the end effect.

4

5

6

7

8

First Sound Clip Second Sound Clip

En

joym

ent

(9-p

oin

t sc

ale)

Worse End Worse Middle


Finally, participants’ stated preference between sound clips 1 and 2 showed that they

were more likely to prefer the second clip over the first one when that second clip was the Worse

Middle clip (M = 1.14, SD = 2.69) rather than the Worse End clip (M = 0.11, SD = 2.70), F(1,

488) = 17.64, p < .001. Thus, participants showed a conscious preference for music with a poor

middle over music that ends poorly.

Discussion

Study 5 conceptually replicated the effect of Study 4 with positive experiences.

Consistent with prior research, participants reported enjoying the same music compilation less

when the less enjoyable segment appeared at the end, rather than in the middle. However, this

finding only held when participants were asked to directly compare the two arrangements, either

implicitly (when they were asked to evaluate the second clip after evaluating a clip that was

identical except for the position of the less enjoyable segment), or explicitly (when asked which

of the two clips they preferred). When participants simply listened to and rated the first music

compilation, their enjoyment was completely unaffected by the position of the less enjoyable

segment—even though participants could clearly tell that the middle (or end) of the clip was

worse than the rest of the clip, as revealed by the manipulation check measures for the first clip.

This is consistent with the absence of an end effect observed in the previous four studies, and

suggests that people do not spontaneously overweight the end of an experience. Instead, the

structure of the experience has to be made salient as a possibly relevant evaluation criterion for

average constant).

Study 6: Relating Overall Evaluations to Ratings of the End of the Experience

So far, we have addressed two sources of support for the end effect in prior research:

demonstrations of the positive effect of “adding a better end” and the more favorable evaluations


of experiences that end well in within-subject designs. However, as we mentioned in the

introduction, there is a third type of prior support for the end effect. Specifically, several studies

have demonstrated that, when overall evaluations are regressed on moment-to-moment ratings of

the experience, the rating of the final moments of the experience is a particularly effective

predictor of the overall evaluation. Yet, as discussed earlier, this does not necessarily imply that

the final moments are being over-weighted. If moment-to-moment ratings do not only reflect

people’s isolated reaction to the current moment, but are also influenced by past moments, then

final ratings would be more effective predictors because they incorporate more information.

In Study 6, we aimed to examine this issue by focusing on an experience with distinct

components that can be evaluated separately, thus reducing any possible confusion or

contamination by prior impressions (as may be the case with a continuous noise). Specifically,

we used field study data from participants in an obstacle course fun run. After the run,

participants were asked to rate their satisfaction with the race in addition to rating each

individual obstacle, as well as providing an overall rating of the obstacles. If participants’

impression of the experience was disproportionately affected by the end, then one would expect

that the rating of the final obstacle would be a better predictor of participants’ satisfaction than

the ratings of the other obstacles. Further, one would expect that, when controlling for the overall

rating of the obstacles, the rating of the final obstacle would improve the prediction of

participants’ satisfaction with the race.

Method

Seven hundred and fifty participants in an obstacle course fun run completed the study

online in the days following the completion of the race.


Participants completed a fun run consisting of 12 large obstacles. The night following the

race, participants received an email from the race company which included a link to a race

evaluation survey. Participants first indicated their satisfaction with the race on a 10-point scale

(1 = not satisfied, 10 = very satisfied). Later in the survey, participants provided their overall

evaluation of all the obstacles in the run on a 10-point scale (1 = lame, 10 = awesome). Next,

they rated each individual obstacle they completed on a five-point scale (1 = lame, 5 =

awesome). Other items measured in this survey, not relevant to the current research, are available

upon request.

Results and Discussion

We first regressed participants’ satisfaction with the race on the ratings of each of the

twelve obstacles. Although the final obstacle was a significant predictor (β = 0.24, t(737) = 6.61,

p < .001), out of the eleven other obstacles, nine were better predictors of participants’

satisfaction than the rating of the final obstacle (see Table 1).

Table 1. Results of Separate Regressions of Satisfaction with the Race on the Rating of each Obstacle (in Chronological Order).

Obstacle Order

β

Obstacle Order

β

Obstacle Order

β

1 0.294 5 0.278 9 0.257

2 0.292 6 0.181 10 0.241

3 0.293 7 0.289 11 0.252

4 0.209 8 0.238 12 0.235

Note: All betas are reliably different from 0 (all t’s(737) > 5.03, p’s < .001)

As an alternative test of the special status of the final event, we also regressed

participants’ satisfaction with the race on both the overall rating of the obstacles and the


individual rating of the final obstacle. The overall rating of the obstacles significantly predicted

satisfaction with the race, β = 0.55, t(747) = 17.01, p < .001. However, once the overall rating of

the obstacles was taken into account, the rating of the final obstacle did not contribute

significantly to the prediction of overall satisfaction with the race, β = 0.05, t(747) = 1.54, NS.

These results suggest that, when people can cleanly separate the different components of

an experience, then the rating of the final component is not a privileged determinant of the

overall evaluation. Of course, since this study was a field experiment, it suffered from several

limitations, the most important of which is that the order of the obstacles was not

counterbalanced. We can therefore not rule out that an idiosyncratic property of the final obstacle

may have reduced its relationship with satisfaction and thus counter-acted the end effect.

General Discussion

Prior research has argued that evaluations of experiences are disproportionately

influenced by the final moments of the experience, since endings have a privileged status as a

prototypical moment of the experience. Although past research has documented several notable

boundary conditions in which this effect does not obtain, there exists an impressive body of

evidence supporting the existence of an end effect for simple, continuous experiences. In this

paper, we did not set out to identify additional boundary conditions, but rather re-examined the

basic end effect, starting with the type of experience that was used in the initial demonstrations

of the effect: a simple, short, meaningless, continuous, aversive sensation (listening to an

irritating noise). Yet, in spite of meeting those conditions, this experience did not produce an end

effect in our studies: the noise was not rated as more aversive when the loudest part was placed

at the end rather than in the beginning (Studies 1 and 4) and was not rated as less aversive when

a softer section was placed at the end rather than in the middle (Study 2). Other studies with a


positive experience also failed to document an end effect: listening to a short music compilation

was not rated as less enjoyable when a weaker music segment was placed at the end of the

compilation rather than in the middle (Studies 3 and 5). These null effects obtained even though

participants could readily recall whether the end or middle of the experience was particularly

good or bad, and even though the studies were properly powered to detect a small effect with a

reasonable probability. Finally, results from a correlational study further corroborate the pattern

observed in the experimental studies. In a large-scale field study with obstacle race participants,

we failed to observe a privileged relationship between the rating of the last obstacle of the race

and the overall satisfaction with the race (Study 6). As such, these results question the

assumption that the final moments of an experience have an inherent, substantial advantage in

determining the overall evaluation of the experience.

Although each of our studies documented a failure to obtain the end effect, our results are

not inconsistent with past demonstrations. Specifically, consistent with prior research, we found

that extending an experience with a less intense ending results in less extreme global evaluations

of that experience. However, adding the less intense segment in the middle rather than at the end

produced the same results. Thus, our findings indicate that the effect of adding a less intense

ending is driven by a reduction in the average intensity of the experience rather than a

disproportionate impact of the ending. Similarly, consistent with prior research, we observed that

when each participant evaluated multiple experiences that only differed in the ordering of its

components, participants did prefer experiences that ended well over experiences that ended

poorly. Yet, the structure of the experience did not affect the evaluation of the first experience

participants encountered. Therefore, our results suggest that people do not spontaneously assign


greater weight to the ending, but instead rely on the structure of the experience when it is a

salient basis for evaluation because it is the only aspect that differs between experiences.

Our experimental studies thus empirically addressed two types of prior support for the

end effect: the fact that extending an experience with a less intense ending weakens the overall

evaluation, and the fact that people prefer experiences that end well over equivalent experiences

that end poorly. Additionally, in Study 6, we addressed a third type of empirical support for the

end effect: the fact that the moment-to-moment rating of the end of an experience can be a

particularly good predictor of the global evaluation of that experience. We have proposed that

this privileged relationship may be explained by mechanisms other than the over-weighting of

the end. For instance, if moment-to-moment ratings are also influenced by past moments in the

experience, then final ratings would be more effective predictors because they incorporate more

information. Alternatively, explicit final moment-to-moment ratings may simply serve as salient

anchors for the immediately subsequent overall evaluation of the experience. In study 6, we

examined an experience that consisted of easily identifiable parts (thus reducing confusion and

contamination of the ratings) and we asked participants to rate those parts after the overall

evaluation (thus avoiding anchoring of the overall evaluation on the final rating). Under these

circumstances, we did not observe any privileged relationship between the rating of the final part

of the experience and participants’ overall satisfaction with the experience—which is consistent

with our alternative interpretations of those prior findings.

Although we propose, based on our results, that endings are not inherently over-weighted

in retrospective evaluations of experiences, this certainly does not imply that endings cannot

have a disproportionate impact when additional conditions are fulfilled. As studies 4 and 5

already indicate, when differences in structure are highly salient, people may rely on their lay


beliefs about the desirability of good endings and prefer experiences with better endings over

other, equivalent experiences. Moreover, we can identify at least two other circumstances under

which the final moments of experiences are likely over-weighted.

First, when the last part of an experience is particularly meaningful, and colors the

perception of everything that preceded it, we would naturally expect it to disproportionately

impact the overall evaluation. For instance, evaluations of goal-directed experiences may be

particularly affected by the end of the experience (Carmon & Kahneman, 1996) since the end

often determines whether the goal has been met (and thus whether the preceding effort was in

vain or not). Similar to goal-directed experiences, the end may also be particularly meaningful

(and influential) for narrative experiences, such as watching television shows (Hui et al., 2014),

since the end of an episode often provides some type of resolution. The evaluation of a murder

mystery strongly depends on how the mystery is being resolved, just as the evaluation of a

romantic comedy depends on whether the couple ends up together, and the evaluation of a

baseball game depends on which team wins.

Aside from being particularly meaningful, endings can also have a disproportionate

impact through a second mechanism: a recency effect. Specifically, for experiences that are long

and varied (e.g., a year-long trip around the world), people may simply be unable to remember

many parts of the experience due to memory constraints. In that case, the overall evaluation may

be disproportionately influenced by the beginning and end of the experience since research on

list memorization finds that these components are recalled more easily than items in the middle

(Ebbinghaus 1913). For instance, the observation of an end effect for hypothetical experiences

presented in list format has been attributed to such recency effects due to memory constraints

(Montgomery & Unnava, 2009).


It should be noted that, for many experiences, the end does not benefit from either

recency effects or being particularly meaningful—in which case we would not expect the end to

have a disproportionate impact on evaluations. Even TV shows or narratives do not always offer

meaningful endings that provide a resolution. For instance, in contrast to a romantic comedy, the

ending of a nature documentary may not be more meaningful than what preceded it. Similarly,

for the experiences studied in this paper, neither the final seconds of the noise nor the final

fragment of the music compilation were more meaningful than the rest of the experience. The

same holds for the situations identified as boundary conditions of the effect: the last part of a

meal (Rode, Rozin, & Durlach, 2007), the final activity over the course of a day (Miron-Shatz,

2009), and the last moments of a vacation (Kemp, Burt, & Furneaux, 2008) do not commonly

convey any special meaning.

In sum, the current research cautions against the common recommendation to restructure

experiences to end on a high note. Although this improved ending may disproportionately impact

evaluations in specific cases, our studies suggest that this would not occur merely because it is

the ending. That is, rather than positing the existence of an inherent end effect that is disabled in

specific circumstances (i.e., those identified by the boundary condition studies), we propose that

it is more accurate to state that there is no inherent end effect, but that endings can have a

disproportionate impact on evaluations through other processes under specific circumstances.


References

Ariely, D. (1998). Combining experiences over time: The effects of duration, intensity changes

& on-line measurements on retrospective pain evaluations. Journal of Behavioral

Decision Making, 11, 19-45.

Ariely, D., & Zauberman, G. (2000). On the Making of an Experience: The Effects of Breaking

and Combining Experiences on their Overall Evaluation. Journal of Behavioral Decision

Making, 13(2), 219-232.

Ariely, D., & Carmon, Z. (2000). Gestalt Characteristics of Experiences: The Defining Features

of Summarized Events. Journal of Behavioral Decision Making, 13, 191-201.

Baumgartner, H., Sujan, M., & Padgett, D. (1997). Patterns of Affective Reactions to

Advertisments: The Integration of Moment-to-Moment Responses. Journal of Marketing

Research, 34(2), 219-232.

Branigan, C., Moise, J., Fredrickson, B., & Kahneman, D. (1997). Peak (but not end) ANS

reactivity to aversive episodes predicts bracing for anticipated re-experience. Poster

presented at Society for Psychophysiological Research, Cape Cod, MA. Abstract

retrieved from https://www.sprweb.org/meeting/past_mtng/1997/97posters1.html.

Carmon, Z., & Kahneman, D. (1996). The Experienced Utility of Queuing: Experience Profiles

and Retrospective Evaluations of Simulated Queues. Retrieved from

http://faculty.insead.edu/carmon/pdffiles/The%20Experienced%20Utility%20of%20Que

uing.pdf.


Conniff, R. (2006). What Modern Science Can Teach You About Turning That Frown Upside

Down. Men's Health. January, 118-123.

Cusick, B. (2012). The Peak-End Rule: A way to improve every customer experience. Retail

Customer Experience. Newsletter, Networld Media Group, September 19, 2012.

Ebbinghaus, H. (1913). On memory: A contribution to experimental psychology, New York:

Teachers College.

Fredrickson, B. L. (1991). Anticipating endings: An explanation for selective social interaction

(Doctoral dissertation, Stanford University, 1990). Dissertation Abstracts.

Fredrickson, B. L., & Kahneman D. (1993). Duration neglect in retrospective evaluations of

affective episodes. Journal of Personality and Social Psychology, 65, 45-55.

Hui, S. K., Meyvis, T., & Assael, H. (2014). Analyzing Moment-to-Moment Data Using a

Bayesian Functional Linear Model: Application to TV Show Pilot Testing. Marketing

Science, 33, 2, 222-240.

Kahneman, D. (2000a). Evaluation by moments: Past and future. In A. Tversky, D. Kahneman,

(Eds.), Choices, values, and frames (pp. 693-708). Cambridge: Cambridge University

Press.

Kahneman, D. (2000b). "Experienced utility and objective happiness: A moment-based

approach," In A. Tversky, D. Kahneman, (Eds.), Choices, values, and frames (pp. 673-

692). Cambridge: Cambridge University Press.

Kahneman, D., Fredrickson, B. L., Schreiber, C. A., & Redelmeier, D. A. (1993). When more

pain is preferred to less: Adding a better end. Psychological Science, 4, 401-405.

Kemp, S., Burt, C. D. B., & Furneaux L. (2008). A Test of the Peak-End Rule with Extended

Autobiographical Events. Memory & Cognition, 36, 132-138.


Loewenstein, G. F., & Prelec, D. (1993). Preferences for sequences of outcomes. Psychological

review, 100(1), 91 - 108.

Miron-Shatz, T. (2009). Evaluating multi-episode events: boundary conditions for the peak-end

rule. Emotion, 9(2), 206-213.

Montgomery, N. V., & Unnava, H. R. (2009). Temporal sequence effects: A memory

framework. Journal of Consumer Research, 36(1), 83-92.

Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks:

Detecting satisficing to increase statistical power. Journal of Experimental Social

Psychology, 45, 867–872.

Pine, J. B., & Gilmore, J. H. (1998). Welcome to the experience economy. Harvard Business

Review, 76, 97-105.

Redelmeier, D. A., & Kahneman D. (1996). Patients' memories of painful medical treatments:

Real-time and retrospective evaluations of two minimally invasive procedures. Pain, 116,

3-8.

Redelmeier, D. A., Katz, J., & Kahneman, D. (2003). Memories of colonoscopy: a randomized

trial. Pain, 104(1), 187-194.

Rode, E., Rozin, P., & Durlach, P. (2007). Experienced and remembered pleasure for meals:

Duration neglect but minimal peak, end (recency) or primacy effects. Appetite, 49(1), 18–

29.

Schreiber, C. A., & Kahneman D. (2000). Determinants of the remembered utility of aversive

sounds. Journal of Experimental Psychology: General, 129(1), 27-42.


Shaw, C., Dibeehi, Q., & Walden, S. (2010). Customer Experience: Future Trends and Insights.

Great Britain: Palgrave Macmillan. Google books. Web. 29 August 2014.

http://books.google.com.

Surowiecki, J. (2002). Boom and Gloom. The New Yorker. 11 November: The New Yorker.

Web. 29 August 2014. www.newyorker.com.

Varey, C. A., & Kahneman D. (1992). Experiences extended across time: Evaluation of moments

and episodes. Journal of Behavioral Decision Making, 5, 169-186.

Wirtz, D., Kruger, J., Scollon, C. N., & Diener, E. (2003). What to do on spring break? The role

of predicted, on-line and remembered experience in future choice. Psychological Science,

14, 520-524.

Xu, E. R., Knight, E. J., & Kralik, J. D. (2011). Rhesus monkeys lack a consistent peak-end

effect. The Quarterly Journal of Experimental Psychology, 64(12), 2301-2315.

End effect manuscript-FINAL-1 - Rady School of Management · the experience. This would result in...

Documents

Transcript of End effect manuscript-FINAL-1 - Rady School of Management · the experience. This would result in...