Detecting Earnings Management Using Discontinuity … · Detecting Earnings Management Using...

50
Detecting Earnings Management Using Discontinuity Evidence* David Burgstahler Julius A. Roller Professor of Accounting University of Washington/Seattle Elizabeth Chuk Assistant Professor University of Southern California January 20, 2014 Abstract Evidence of earnings management based on discontinuities in earnings distributions at prominent benchmarks is pervasive, yet there is little evidence of significant abnormal accruals in settings where there is clear evidence of earnings management to meet benchmarks. Together with previous results on the characteristics of accruals-based tests, results in this paper on the statistical characteristics of discontinuity tests reconcile this apparent inconsistency. The distribution of the standardized difference statistic is derived, relaxing the Burgstahler and Dichev (1997) assumption that the two components in the numerator of the standardized difference are independent. Simulation results confirm the derivation and also suggest that the independence assumption is not likely to induce important errors in typical applications. The power of the standardized difference test is evaluated for alternative assumptions about the pattern of earnings management to meet benchmarks. In contrast to results from previous research that suggest abnormal accruals tests have reasonable power only in samples where there are high rates and large amounts of accruals management, discontinuity tests have the power to detect far smaller amounts and much lower rates of total earnings management to meet a benchmark. Preliminary and incomplete draft prepared for discussion at UTS. Comments invited but please do not quote, cite, or distribute without permission JEL classification: G14, M40 Key Words: Earnings management, discontinuities * We appreciate helpful comments and suggestions from Russell Lundholm, Norman Strong, and seminar participants at the University of British Columbia and the University of Illinois.

Transcript of Detecting Earnings Management Using Discontinuity … · Detecting Earnings Management Using...

Detecting Earnings Management Using Discontinuity Evidence*

David Burgstahler Julius A. Roller Professor of Accounting

University of Washington/Seattle

Elizabeth Chuk Assistant Professor

University of Southern California

January 20, 2014

Abstract

Evidence of earnings management based on discontinuities in earnings distributions at prominent benchmarks is pervasive, yet there is little evidence of significant abnormal accruals in settings where there is clear evidence of earnings management to meet benchmarks. Together with previous results on the characteristics of accruals-based tests, results in this paper on the statistical characteristics of discontinuity tests reconcile this apparent inconsistency. The distribution of the standardized difference statistic is derived, relaxing the Burgstahler and Dichev (1997) assumption that the two components in the numerator of the standardized difference are independent. Simulation results confirm the derivation and also suggest that the independence assumption is not likely to induce important errors in typical applications. The power of the standardized difference test is evaluated for alternative assumptions about the pattern of earnings management to meet benchmarks. In contrast to results from previous research that suggest abnormal accruals tests have reasonable power only in samples where there are high rates and large amounts of accruals management, discontinuity tests have the power to detect far smaller amounts and much lower rates of total earnings management to meet a benchmark.

Preliminary and incomplete draft prepared for discussion at UTS. Comments invited but please do not quote, cite, or distribute without permission

JEL classification: G14, M40 Key Words: Earnings management, discontinuities

* We appreciate helpful comments and suggestions from Russell Lundholm, Norman Strong, and seminar participants at the University of British Columbia and the University of Illinois.

Page 1

1. Introduction

There is pervasive evidence of discontinuities in distributions of reported earnings at

prominent earnings benchmarks, where distributions comprise fewer observations immediately

below the benchmark and more observations immediately above the benchmark than would be

expected if the distribution were smooth.1 The body of evidence is consistent with the theory

that managers take actions to ensure that earnings meet benchmarks, e.g., earnings are managed

to avoid small losses, small earnings decreases, and small negative earnings surprises.2 This

interpretation is further supported by survey evidence in Graham, Harvey, and Rajgopal (2005)

indicating that managers are willing to incur real costs in order to meet benchmarks. However,

there has been no formal evaluation of the statistical properties of discontinuity tests to detect

earnings management.3

Inferences from empirical evidence depend on an understanding of the determinants of

the size and power of the statistical tests employed. If a test is sensitive to violations of

assumptions, a significant test statistic might be attributable to violations of assumptions rather

than to a false null hypothesis. On the other hand, if a test has low power, insignificant results

might be attributable to low power rather than to a true null hypothesis. Further, significant

results from a test with low power lead to little or no revision in beliefs.4

1 The smoothness assumption is discussed in more detail in Section 2. 2 For example, Burgstahler and Dichev (1997, hereafter BD) show that distributions of earnings levels and distributions of earnings changes exhibit discontinuities at zero and there is widespread evidence of discontinuities in distributions of earnings surprises, e.g., Degeorge, Patel, and Zeckhauser (1999), Brown (2001), Matsumoto (2002), Brown and Caylor (2005), Burgstahler and Eames (2006). There is also evidence of discontinuities at prominent non-zero earnings benchmarks, e.g., Carslaw (1988), Thomas (1989), Das and Zhang (2003), Grundfest and Malenko (2009), and Burgstahler and Chuk (2013). 3 In contrast, the power of methods of detecting earnings management based on measures of abnormal accruals is evaluated in Dechow, Sloan, and Sweeney (1995), Dechow et al (2010), and Ecker et al (2011). 4 See Burgstahler (1987) for further discussion of the roles of size and power in inference from empirical results.

Page 2

The purpose of this paper is to (i) refine the derivation of the distribution of the

Burgstahler and Dichev (1997, hereafter BD) standardized difference statistic under the null and

alternative hypotheses, relaxing the BD simplifying assumption that the two numerator

components of the test statistic are independent, and (ii) analyze the determinants of the power of

standardized difference tests under various alternative hypotheses, i.e., various assumptions

about how earnings are managed to meet benchmarks.

The power analysis shows that the standardized difference test has power to detect

management of relatively small amounts of earnings by even a small proportion (such as .25–

.50% of MVE for .1-.2%) of sample firms.5 In contrast, previous research suggests that accruals-

based tests have reasonable power only for much larger amounts and much higher rates of

earnings management. Thus, discontinuity tests have far greater power to detect much less

pervasive, and much smaller amounts of, earnings management than do tests based on abnormal

accruals. These results reconcile what has sometimes been interpreted as an inconsistency in the

literature, namely the absence of significant abnormal accruals among firms with earnings that

fall just above significant benchmarks (Dechow, Richardson, and Tuna 2003, Ayers et al 2006).

The paper is organized as follows. Section 2 provides background. Section 3 reviews

and refines the theoretical derivation of the distribution of the standardized difference statistic.

Section 4 provides simulation evidence to confirm the derivation. Section 5 analyzes the

determinants of power of the standardized difference test and discusses implications for research

design. Section 6 concludes.

5 As explained further below, .1%-.2% is the proportion of firms managing earnings when, for example, 1% of the population has pre-managed earnings a small amount below a benchmark and 10%-20% of firms with earnings a small amount below a benchmark manage earnings upwards to meet the benchmark.

Page 3

2. Background

Earnings management occurs when the perceived benefits of managing earnings

(including both benefits to the firm and potentially divergent benefits to the manager) exceed the

costs. Healy and Wahlen (1999) note: "Despite the popular wisdom that earnings management

exists, it has been remarkably difficult for researchers to convincingly document it." There is

pervasive evidence of discontinuities in earnings distributions at prominent benchmarks,

consistent with earnings management.6 In contrast, there is less evidence of significant abnormal

accruals in situations where earnings management behavior is hypothesized. Further, Ball

(2013) outlines a number of reasons to be skeptical about significant abnormal accruals results

that imply that management of “enormous amounts” of accruals is “rife.”

2.1 Evidence of discontinuities

Earnings management to meet a benchmark transforms some pre-managed earnings

observations below the benchmark into reported (i.e., post-managed) observations above the

benchmark.7 The result is a decrease in the frequency of earnings observations below the

benchmark and an increase in the frequency above the benchmark, creating a discontinuity at the

benchmark in the distribution of reported (post-managed) earnings.8

The set of alternative methods of earnings management is broad. To meet a benchmark,

managers might enter into real operating, investing, or financing transactions that generate

6 Two recent papers by Durtschi and Easton (2005 and 2009) purport to show that the discontinuities in earnings distributions are explained by “deflation, sample selection, and a difference between the characteristics of profit and loss observations.” Burgstahler and Chuk (2013) show that the Durtschi and Easton "explanations" for discontinuities are erroneous. 7 Researchers have examined incentives to manage earnings upward to create the appearance of better current performance, or downward to create the appearance of worse current performance or to transfer earnings to future periods from the current period. This paper focuses on upward management to increase current performance, the phenomenon that creates discontinuities at current earnings benchmarks. 8 Kerstein and Rai (2007) provide direct evidence that earnings management converts observations below the benchmark into observations above the benchmark.

Page 4

additional current profits, where these transactions might be purely incremental with no effects

on future profitability or they might have adverse effects on future profitability.9 In addition,

managers might make accounting choices to increase current reported profits. For example,

managers might accelerate recognition of revenues or defer recognition of expenses as permitted

within GAAP. Managers might record transactions in a way that violates GAAP or record

fictitious or fraudulent transactions. Any combination of these actions can be used to manage

earnings to meet the benchmark, and managers are expected to choose the lowest-cost

combination. Because reported earnings reflect all of these actions, discontinuities provide

evidence that some combination of these actions has been taken to meet the benchmark, but do

not reveal which actions were taken.10

Multiple studies document discontinuities around various earnings benchmarks, including

the profit/loss benchmark (Hayn, 1995; BD, 1997; Degeorge et al., 1999); prior-year earnings

(BD, 1997; Degeorge et al., 1999; Beatty et al., 2002; Donelson et al., 2013); and analyst

forecasts (Degeorge et al., 1999; Burgstahler and Eames, 2006; Donelson et al., 2013). Leuz,

Nanda, and Wysocki (2003) report evidence of loss avoidance for an international sample of

firms from 31 countries. Suda and Shuto (2005) provide strong evidence that Japanese firms

engage in earnings management to avoid earnings decreases and losses. Haw et al. (2005)

document discontinuities for the return on equity (ROE), where there is an unusually high

proportion of firms reporting ROEs slightly greater than the 10% benchmark. Daske, Gebhardt,

9 See Jensen (2001, 2003) for multiple examples of transactions and decisions that increase current earnings but decrease expected future earnings and firm value. 10 Schipper (1989) defines "disclosure management" as "purposeful intervention in the external financial reporting process, with the intent of obtaining some private gain…"(p. 92) but also notes that there are additional forms of "real" earnings management. i.e. real operating, investing, or financing decisions that alter reported earnings. Healy and Whalen (1999) use a broad definition that encompasses a broad range of actions: "Earnings management occurs when managers use judgment in financial reporting and in structuring transactions to alter financial reports to either mislead some stakeholders about the underlying economic performance of the company or to influence contractual outcomes that depend on reported accounting numbers." (p. 368)

Page 5

and McLeay (2006) report even more pronounced discontinuities in the European Union than in

the US for the profit/loss, prior year earnings, and analyst forecast benchmarks.

2.2 Evidence of abnormal accruals

Accruals tests identify as abnormal the effects of any transactions or events where

accruals deviate significantly from a model of "normal" accruals. Thus, abnormal accrual

models are not designed to detect earnings management via methods other than accruals.

Further, abnormal accrual models will not detect earnings management structured to fit the

model of normal accruals, such as fraudulent transactions structured so that the accruals fit

within the "normal" range. Also, as discussed further below, accrual models may incorrectly

identify other unusual circumstances, such as unusual growth or other forms of extreme financial

performance, as "abnormal accruals."

Previous research suggests that in order for accruals-based tests to have reasonable

power, earnings management amounts must be large and earnings management must be

pervasive among sample firms. Dechow, Sloan, and Sweeney (1995) show that the power of

accrual-based models for detecting earnings management is low even for relatively large

amounts of earnings management (1 to 5 percent of total assets) by 100% of sample firms. More

recently, Dechow et al (2010, p. 24-25) develop a test with greater, but still relatively low,

power. Similarly, Ecker et al (2011) examine the effect of peer firm selection on the power of

discretionary accruals models to detect earnings management and show that peer selection based

on lagged assets provides higher power than selection based on industry. However, the power to

detect moderate to large amounts of earnings management remains relatively low, even for high

rates and relatively large amounts of earnings management. For example, the Ecker et al (2011)

Figure 1 summary shows that for accruals management equal to 4% (10%) of total assets for

Page 6

100% of sample firms, the power of tests conducted at the .05 level of significance is generally

on the order of .10 (.25).11

While the magnitude of accruals management required to achieve even modest power is

clearly large relative to total assets, the required magnitude is even more striking relative to

expected earnings. For example, assume long-run expected earnings is on average on the order

of 10% of book value of equity. For a firm with no debt in the capital structure where equity is

equal to total assets, the Ecker et al estimates suggest the power of tests based on discretionary

accruals is only .10 (.25) when abnormal accruals average 40% (100%) of long-run expected

earnings. For a firm with 50% debt in the capital structure where equity is equal to one-half of

total assets, the power is only .10 (.25) when abnormal accruals average 80% (200%) of expected

earnings.12

In addition to the limitations of low power and narrow focus on one specific method of

earnings management, accruals-based approaches are subject to substantial risk of spurious

significance due to factors other than earnings management. Accruals tests compare estimates of

pre-managed accruals with reported accruals for each individual observation. The estimates of

pre-managed earnings are prone to both estimation error and bias, so abnormal accruals are

subject to estimation error and bias.13 For example, Dechow, Sloan, and Sweeney (1995) show

that accruals-based models “reject the null hypothesis of no earnings management at rates

exceeding the specified test-levels when applied to samples of firms with extreme financial

performance.” Collins and Hribar (2002) show that the error in the balance-sheet approach to

estimating accruals is correlated with economic characteristics of the firm, which can lead to a

11 Stubben (2010) develops a test for abnormal revenue that appears to have somewhat greater power than accruals-based tests, though the test by design detects only revenue-based earnings management. 12 Consistent with Ball (2013), this numerical example illustrates why evidence of significant abnormal accruals implies earnings management of “enormous amounts.” 13 See McNichols (2000).

Page 7

spurious relation between estimates of accruals management and the correlated economic

characteristics.14 Owens, Wu, and Zimmerman (2013) show that widespread business model

shocks result in unrealistically large “abnormal” accruals, and can also introduce biases into both

unsigned and signed discretionary accruals.

2.3 Reconciling discontinuity and abnormal accruals evidence

In several settings where discontinuity tests have identified significant discontinuities,

abnormal accruals tests have not identified significant abnormal accruals. For example, Dechow,

Richardson, and Tuna (2003) are unable to find any evidence that boosting of discretionary

accruals is the key driver of the discontinuity at zero earnings. Similarly, Ayers et al (2006) find

little or no evidence of abnormal accruals among firms with earnings that fall just above the

benchmark. One possible explanation for these seemingly inconsistent results is the broader

scope of discontinuity tests (which reflect all methods of earnings management) versus accruals

tests (which reflect only earnings management via accruals manipulation). A second possible

explanation is the low power of accruals-based tests shown in previous research coupled with

higher power of discontinuity tests. However, to date there has been no formal evaluation of the

power of discontinuity tests. Section 5 below provides this evaluation, showing that

discontinuity tests have the ability to detect much lower rates and much smaller amounts of

earnings management than do abnormal accruals tests.

3. Distribution of the standardized difference test statistic

The standardized difference test developed in BD is designed to detect effects of earnings

management to meet a benchmark.15 The standardized difference test relies on the assumption

14 See also Kothari, Leone, and Wasley (2005) and Dechow et al (2010). 15 The standardized difference statistic has also been used to test for effects of upward management of variables other than earnings (see, for example, Dichev and Skinner 2002 or Dyreng, Mayew, and Schipper 2012) and to test

Page 8

that in the absence of earnings management, the pre-managed earnings distribution is

approximately smooth in the vicinity of the benchmark. A distribution is smooth at interval i

when the probability that an observation falls in interval i is equal to the average of the

probabilities for intervals i–1 and i+1. When there is no management in the vicinity of interval i,

the expected difference between the number of observations in interval i and the average number

in the two adjacent intervals is approximately zero, and the expectation of the standardized

difference is approximately zero.16

On the other hand, earnings management to meet a benchmark will create a non-zero

expectation for the interval immediately below the benchmark, referred to here as interval –1 and

for the interval immediately above the benchmark, referred to as interval +1. Earnings

management that transforms observations from interval –1 into intervals above the benchmark

creates a dearth of observations in interval –1 relative to the average in the adjacent intervals,

leading to a negative expectation for the standardized difference for interval –1. Earnings

management that transforms observations into interval +1 from intervals below the benchmark

creates an excess of observations in interval i relative to the average in the adjacent intervals,

leading to a positive expectation for the standardized difference for interval +1.

Two factors have an important effect on the extent to which smoothness of the

distribution of pre-managed earnings can be expected to hold. The first factor, interval width, is

a research design choice that is directly controllable by the researcher. For narrow intervals,

smoothness is likely to hold approximately for most intervals in a typical distribution of

earnings. For example, the simulation results below show that for interval widths equal to .04

for downward management of size variables (see, Bernard, Burgstahler, and Kaya 2013). However, to simplify the exposition, the examples and discussion in this paper focus specifically on upward management of earnings. 16 The name of the standardized difference test is derived from the fact that the test statistic for interval i is the difference between the number of observations in interval i and the average of the numbers in the two adjacent intervals standardized by the approximate standard deviation of the numerator difference.

Page 9

standard deviations, the approximation to smoothness is sufficiently close that the effective

rejection rate for a test designed to have a rejection rate of 5% is never larger than 5.3% for any

of the intervals between ±2 standard deviations. On the other hand, much wider intervals can

result in substantial departures from smoothness. For example, interval widths of .5 standard

deviations for the same normal distribution will lead to rejection rates that far exceed the

designed level.

The second factor, location of the distribution relative to the benchmark is observable,

though not controllable, by the researcher. There are two cases where location of the distribution

has important effects on the test. First, when the location of the distribution is such that the

benchmark is further in the tail of the distribution, there are fewer pre-managed earnings

observations immediately below the benchmark to be managed, reducing the effective sample

size and reducing the power of the test. Second, for the case where the peak of the distribution

of pre-managed earnings falls in interval +1 (the interval immediately above the benchmark), the

expected number of observations in interval +1 is by definition greater than the average of the

numbers in intervals –1 and +2, so the expectation of the standardized difference for interval +1

is positive. In these cases, the location of the distribution relative to the benchmark potentially

leads to rejection rates that exceed the nominal rate.17

Denote the number of observations in the entire distribution by N, the number of

observations in interval i by ni and the number of observations in the intervals immediately

above and below interval i by ni-1 and ni+1. Similarly, denote the probability that an observation

17 However, for some distributions and interval widths, the practical effect on rejection rates may be small. For example, in the simulation reported below, the positive expectation for the standardized difference for the interval at the peak of the distribution results in an effective rejection rate of just 5.3% for a planned rejection rate of 5%. This issue is discussed in more detail with the simulation results.

Page 10

falls in interval i by pi and the probabilities that an observation falls in either the interval

immediately below or the interval immediately above interval i by pi-1 and pi+1, respectively.

The numerator of the standardized difference is a function of two multinomial random

variables, the observed frequency in interval i, ni, and the sum of the frequencies in the intervals

immediately below and above interval i, ni-1 + ni+1. Specifically, the numerator can be interpreted

as the sum of ni and the quantity (ni-1 + ni+1) multiplied by –1/2 (to subtract the average of ni-1 +

ni+1). The marginal distributions for the two terms of the numerator sum are binomial, with

respective expectations Npi and –1/2N(pi–1+pi+1), variances Npi(1–pi) and 1/4N(pi–1+pi+1)(1–pi-1–

pi+1) and covariance (–1/2Npi (pi-1+pi+1)).18

Define δ as the difference between the probability for interval i and the average of the

probabilities for intervals i–1 and i+1,

δ ≡ pi – ½ (pi-1+pi+1). (1)

Thus, δ is positive when interval i is a local peak (i.e., when the probability in interval i is greater

than the average of the probabilities in the two adjacent intervals) and δ is negative when interval

i is a local trough.

Using the definition of δ, the expectation of the numerator of the standardized difference

is

E[ni – ½ (ni-1 + ni+1)] = N[pi – ½ (pi-1+pi+1)] (2)

= N δ.

18 See, for example, Johnson and Kotz (1969, p. 281 and 284).

Page 11

The variance of the numerator is the sum of the variances of the numerator components

plus twice the covariance:19

V(numerator) = Npi(1–pi) + 1/4N(pi-1+pi+1)(1–pi-1–pi+1) + 2 (–1/2){– Npi (pi-1+pi+1)} (3)

= Npi(1–pi) + 1/4N(pi-1+pi+1)(1–pi-1–pi+1) + Npi (pi-1+pi+1).

Because (pi–1+pi+1) ≡ 2 (pi – δ), the variance in (3) can also be written as

V(numerator) = Npi(1–pi) + 1/2N(pi–δ)(1–2pi+2δ) + 2Npi (pi–δ) (4)

= Npi(1–pi) + 1/2N[pi –2pi2

– δ+4piδ–2δ2] + 2Npi2 –2Npiδ

= Npi–Npi2 + 1/2Npi –Npi

2 –1/2Nδ+2Npiδ–Nδ2 + 2Npi

2 –2Npiδ

= 3/2 N pi – Nδ(1/2+δ) .

For the case of a smooth earnings distribution where δ = 0, the variance expression in (4)

simplifies to

V(numerator) = 3/2 N pi . (5)

The standardized difference statistic is defined as the numerator, [ni – ½ (ni-1 + ni+1)],

standardized by the square root of the variance in equation (3) or (4), where each unobservable pj

is replace by the empirical estimate nj/N. The two multinomial random variables in the

numerator difference are distributed approximately normal for large N and the difference

between two normal variates is also normal. Thus, the standardized difference is distributed

approximately normal (0,1) and the significance of the standardized difference statistic can be

evaluated by reference to a standard normal distribution.20 Commonly-used rules of thumb from

19 Burgstahler and Dichev (1997, footnote 6) assume the two numerator components are approximately independent so that the covariance of the numerator components is approximately zero. The effect of this simplifying assumption is examined in Section 3.1.1 below. 20 Note that the normality of the standardized difference statistic follows from the normal approximation to the binomial rather than on an assumption about the form of the pre-managed earnings distribution. In contrast, some alternative significance tests for discontinuities rely on more stringent assumptions about the form of the distribution of earnings. For example, the test in Hayn (1995) requires the additional assumption that the distribution of pre-managed earnings is normal. Similarly, Chen et al (2010) assume the pre-managed distribution has a specific

Page 12

the statistics literature suggest that the normal approximation to the binomial is reasonably

accurate for Np(1-p) ≥ 25.21 In typical applications where p is small so that Np(1-p) ≅ Np, the

statistical rule of thumb is approximately equivalent to the condition that the expected interval

sample size, Npi , is 25 or larger.22 For sufficiently large expected values of Npi=E[ni] and N(pi–

1+pi+1)=E[ni–1+ni+1], the standardized difference statistic is distributed approximately normal. For

smaller interval sample sizes, the continuous normal approximation to the discrete multinomial

distribution may be more problematic, and we present simulation evidence on the behavior of the

standardized difference statistic for interval sample sizes much smaller than 25.

3.1 Relation to previous derivations in the literature

3.1.1 BD assumption of independence of the numerator components

BD invoke a simplifying assumption that the number of observations in interval i and the

sum of the numbers of observations in intervals i–1 and i+1 are independent. Therefore, the BD

variance expression omits the covariance term in (4), 2Npi (pi–δ). Omission of the covariance

term results in a misstatement relative to the correct variance in (4) of:

Relative Misstatement = –[2Npi (pi–δ)] / [3/2Npi –Nδ(1/2+δ)] .

distributional form. In both cases, a significant test statistic could be due to a discontinuity but could instead be due to departures from the assumed pre-managed earnings distribution. Other statistical tests assume the pre-managed distribution is known or can be estimated without error. For example, Bollen and Pool (2009) assume that the pre-managed distribution of hedge fund returns is perfectly described by a distribution fitted to the histogram of reported hedge fund returns, as they assume there is no variance in their test statistic attributable to the use of the fitted distribution as an estimate of the unmanaged hedge fund return distribution. Still other alternative tests rely on the assumption that variability of the test statistic for the test interval can be estimated based on variability of a similar statistic in non-test intervals. For example, Degeorge, Patel, and Zeckhauser (1999) test whether the increment in observations at the earnings benchmark is significant relative to the variance of increments in a symmetric set of 10 intervals surrounding the benchmark. 21 Although they are not considered here, corrections for small sample sizes, including a simple continuity correction (see Johnson and Kotz equations 33 and 36, p. 62-64) are available and could be used to improve evaluations of significance when Npi < 25. 22 When the interval sample size is 25 or larger, the error induced by using estimated probabilities in calculating the variance is likely to be small. However, when the standardized difference test is applied to much smaller interval sample sizes, the use of estimated probabilities can lead to more serious errors. For example, in the extreme case where a very low interval sample size gives rise to an empirical frequency of zero, the estimated variance based on the zero frequency is zero.

Page 13

Because the expression –[2Npi (pi–δ)] is virtually always negative, the BD assumption generally

results in an understatement of the variance.23 Further, for the specific case of a smooth

distribution where δ=0, the understatement simplifies to

= –[2Npi2 ] / [3/2Npi ]

= –(4/3) pi

The understatement of the variance results in a corresponding (though slightly smaller)

percentage overstatement of the test statistics.24

3.1.2 BMN "correction"

BMN footnote 12 claims that the variance of the standardized difference test statistic

derived in BD and widely used in the literature is incorrect: "The correct variance, however, is

Npi(1-pi) + 1/4N(pi-1+pi+1)(2–pi-1–pi+1). Because of the difference in the first term in the last

parentheses, the estimated standard deviation used in BD and related papers is understated,

resulting in an overstatement of the standardized difference test statistic." However, BMN

provide neither a derivation for their "correction", nor an explanation of the purported error in

the BD derivation.

The BMN expression for variance is substantially larger than the variance in equation (4).

To see this, the BMN expression can be rewritten as the variance in equation (4) less the

covariance term (the term that is omitted in the BD simplification), 2Npi (pi–δ), plus a final

unexplained term, 1/4N(pi-1+pi+1):

VBMN = [Npi(1-pi) + 1/4N(pi-1+pi+1)(2-pi-1–pi+1) ] (6)

23 This expression is never positive but it can be equal to zero if pi=0 or if both pi-1 and pi+1 are equal to zero. 24 Since the standardized difference is standardized by the square root of the variance, a (4/3)pi relative understatement of the variance translates into understatement of the standard deviation by the square root of [1 - (4/3)pi] and a corresponding overstatement of the standardized difference statistic by the reciprocal of [1 - (4/3)pi]1/2. For example, when pi = .01, .03, or .05 and the distribution is smooth, the BD approximation overstates the test statistic by about .7%, 2.1%, or 3.5%, respectively.

Page 14

= [Npi(1-pi) + 1/4N(pi-1+pi+1)(1-pi-1–pi+1) ] + [1/4N(pi-1+pi+1)]

= [Npi(1-pi)+1/4N(pi-1+pi+1)(1-pi-1–pi+1)] + [2Npi(pi–δ)] – [2Npi(pi–δ)] + [1/4N(pi-1+pi+1)]

= 3/2Npi – Nδ(1/2+δ) – [2Npi(pi–δ)] + [1/4N(pi-1+pi+1)]

The resulting misstatement relative to the correct variance in (4) is:

Relative Misstatement = {– [2Npi(pi–δ)] + [1/4N2(pi–δ)] } / [3/2Npi – Nδ(1/2+δ)] .

The BMN expression results in an overstatement of the variance by about 1/3. For example, for

the specific case of a smooth distribution where δ=0, the relative misstatement reduces to

Relative Misstatement = {– [2Npi2] + [1/4N2pi] } / [3/2Npi ]

= –(4/3)pi + [1/3 ] .

Because the variance is overstated by almost 1/3, standardized differences constructed using the

overstated variance have a standard deviation slightly less than the reciprocal of the square root

of 4/3 which is slightly greater than .866.

4. Simulation Evidence on the Distribution of the Standardized Difference Statistic

Table 1 reports the results of a simulation analysis that serves four purposes. First, in the

simulation the probabilities for each interval are known, and these known probabilities can be

used to compute the variance in equation (4) or (5). The means and variances of standardized

differences where the denominator variance is calculated using the known interval probabilities

are compared with the derived means and variances to verify the derivation. Second, the

simulation illustrates the behavior of the test statistic under conditions where the probabilities

that determine the denominator variance are estimated rather than known. Results are provided

for both large sample sizes, where the effect of using estimated probabilities should be minor,

and for smaller sample sizes where the effect may be more substantial. Third, the simulation

provides examples of the effect of 1) the use of estimated probabilities, and 2) departures from

Page 15

the null hypothesis assumption of smoothness on tests of significance. While these examples are

based on the simulated normal earnings distribution, the examples illustrate how the effect of

departures from smoothness can be explored for any earnings distribution. Finally, the results

provide evidence on the behavior of the test statistic using the simplified variance approximation

from BD or using the incorrect expression suggested in BMN.

Each panel of Table 1 presents the results of 1,000,000 simulation trials with

observations generated from a normal distribution.25 The distribution of generated observations

is segmented into 100 intervals between –2 and +2 standard deviations, where each interval has a

width of .04 standard deviations. On each of the 1,000,000 simulation trials, standardized

differences are evaluated for each of the 100 intervals. As illustrated in Figure 1, the 100

intervals are numbered relative to the mean of the simulated normal distribution, with interval

numbers ranging from -50, …, -1, +1, …, +50. Column 1 of Table 1 contains the interval

numbers and the expected interval sample size, Npi, is shown in column 2.

The normal distribution used to generate the observations is not smooth. As illustrated in

Figure 1, for intervals more than 1 standard deviation below or above the mean (intervals –50 to

–26 and +26 to +50), the normal distribution is convex, and in this range the expectations of the

standardized differences are slightly negative. For intervals less than 1 standard deviation from

the mean (intervals –25 to +25), the distribution is concave, the expectations are slightly positive.

Column 3 shows the expected value of the standardized difference based on the known interval

probabilities in the simulation.

25 The normal distribution is used to generate the simulation observations because the overall shape of the normal is broadly consistent with the overall shape of typical earnings distributions. However, the approximate normality of the standardized difference statistic does not depend on normality of the distribution generating the earnings observations but rather on the approximate normality of multinomial random variables for large interval sizes.

Page 16

Table 1 Panel A shows results for samples corresponding to 64,000 earnings

observations, corresponding to typical sample sizes encountered in earnings management

contexts.26 Panel B shows results for samples of 4,000 observations, illustrating the effects of a

far smaller sample size, where there will be more substantial divergence between the continuous

normal distribution that is used to approximate the discrete multinomial distribution.

4.1 Verification of distribution

In both Panels A and B, Columns 4 and 5 show the means and standard deviations when

the numerator of the generated standardized difference statistics is divided by the square root of

the variance computed using the known interval probabilities. These standardized difference test

statistics should have a mean equal to the theoretical expectation shown in column 3, and a

standard deviation equal to one. In both panels, the difference between the average empirical

mean in column 4 and the average theoretical mean in column 3 is less than .00001 and the

standard deviation of the differences between the empirical mean in column 4 and the theoretical

mean in column 3 is less than .001, the expected standard deviation for estimates of a random

variable with unit variance when the estimates are based on 1,000,000 simulation trials. Also,

the average of the standard deviations of the empirical standardized differences in column 5 is

1.00003 in Panel A and 1.0001 in Panel B, where the minimum of all the estimated standard

deviations is .99244 and the maximum is 1.00825. Thus, the results in columns 4 and 5 are

consistent with the distribution of the standardized difference derived in Section 3.

26 For example, BD examined distributions of earnings levels with a sample size of about 75,000 and distributions of earnings changes with a sample size of approximately 65,000.

Page 17

4.2 Distribution using estimated variances

In any application of the standardized difference statistic, the computed variance of the

numerator must rely on estimated, rather than known, interval probabilities. Variances using

estimated interval probabilities are subject to greater estimation error for smaller expected

interval sample sizes (small N pi), so there can be more substantial differences between the

estimated and true variances.

Column 6 shows the average standardized differences when the variance is based on

estimated probabilities. The average difference between the standardized differences using

known versus estimated probabilities is less than .00001 and the standard deviation of the

differences using known versus estimated probabilities is less than .001 in both Panels A and B.

Column 7 shows the empirical standard deviation of the standardized difference statistics based

on estimated interval probabilities. In Panel A, where interval sample sizes are 16 times larger,

the mean standard deviation in Column 7 is 1.0036 and the range is .9947 to 1.0125. In Panel B,

the mean standard deviation in Column 7 is 1.0092 and the range is 1.0055 to 1.0182. Thus, the

use of estimated probabilities in applications appears to result in only a very small increase in the

standard deviation of the test statistic, with standard deviations that average about 1.004 for the

large interval sample sizes in Panel A and about 1.009 for the smaller interval sizes in Panel B.

The next section shows that these small increases in standard deviation lead to only small effects

on effective levels of significance.

4.3 Effects of departures from smoothness on significance tests

The null hypothesis of smoothness does not hold precisely for most plausible

distributions of pre-managed earnings. For example, the normal distribution used to generate

observations in the simulation is not smooth, but rather is concave within 1 standard deviation of

Page 18

the mean and convex beyond 1 standard deviation. Thus, the results of the simulation illustrate

the effect of departures from smoothness for one example, the normal distribution. Perhaps more

importantly, the analysis illustrates how to assess the effect of departures from smoothness for

any other specified distribution.

The expectations shown in column 3 reflect the effect of departure from smoothness on

the standardized difference statistic. As the expectations become more positive, the proportion

of significant test statistics should begin to exceed 5%. Conversely, as expectations become

more negative, the proportion should be less than 5%.27 The proportions of significant test

statistic in column 10 are highly consistent with proportions predicted based on the expectation

for each interval.28 For the normal distribution and interval definitions in the simulation,

departures from the nominal 5% level are minimal – the maximum predicted proportion

significant is 5.22% (for simulation intervals –1 and +1) and the maximum proportion significant

realized in the simulation is 5.30% (for interval +2).

The width of the interval and the location of the distribution relative to the benchmark are

important determinants of the effect of departures from smoothness. The simulation results

suggest that the departures from smoothness for a normal generating distribution have little

practical effect on significance tests for interval widths of about .04. However, the effect can be

much larger for much wider intervals. For example, with the same normal generating

distribution, interval widths of .5 standard deviation, and the distribution located such that the

27 For example, when the expected standardized difference is .021, the expected rejection rate for a one-tailed test is .0522, the probability of observing a value greater than 1.645 for a normal variate with mean .021 and standard deviation of 1. Thus, the expected rejection rate is about 5.2% in intervals -3 to +3 in Panel A. Similarly, when the expected standardized difference is –.022, the expected rejection rate is .0478, the probability of a value greater than 1.645 for a normal variate with mean –.022 and standard deviation of 1. Thus, the expected rejection rate is about 4.8% in intervals -49 to –47 or +47 to +49 in Panel A. 28 The average difference (not reported in the table) between the realized proportion significant and the predicted rate is –.0014 in Panel A and –.0007 in Panel B and the standard deviation of the difference between the realized proportion significant and the predicted rate is .0011 in Panel A and .0025 in Panel B.

Page 19

benchmark falls at the mean of the distribution, the expected standardized differences for

intervals –1 and +1 are each positive with identical expectations of 10.00 (2.50) for N = 64,000

(4,000). Thus, using intervals equal to .5 standard deviations (far wider than is typically used in

tests for earnings management) would lead to a rejection rate of essentially 100% for N=64,000

when there is no earnings management. As another example, with interval widths of .5 standard

deviation and the distribution located such that the benchmark falls .25 standard deviations

below the mean of the distribution so that interval +1 is centered at the peak of the distribution,

the expectation is 10.79 (2.70) for N = 64,000 (4,000).

4.4 Effects of BD and BMN variances

Column 8 shows the empirical standard deviation of the standardized difference statistics

generated for each interval when the denominator variance is calculated using the BD

independence assumption. Consistent with the analysis in section 3, the BD independence

assumption yields a test statistic with a standard deviation roughly 1% greater than the

theoretical standard deviation of 1. The average standard deviation in column 8 is 1.007 in Panel

A and 1.012 in Panel B. This small inflation of the standard deviation of the standardized

difference statistic leads to a small corresponding inflation of the proportion of significant test

statistics. In column 11, the additional proportion significant using the BD independence

assumption ranges from .01% to almost .1%, with an average of .055%. Thus, under the

simulation conditions, the effect of using the BD independence assumption in tests of the

significance of the standardized differences is very small.

Column 9 shows the empirical standard deviation of the standardized difference statistics

when the standard deviation of the statistic is calculated using the BMN expression for variance.

Consistent with the analysis in section 3, the average standard deviations in column 9 are slightly

Page 20

greater than .866, specifically .8702 in Panel A and .8709 in Panel B. Column 11 shows that

using the BMN expression yields an effective level of about 3% for a test with a nominal level of

5%. The proportion of significant test statistics ranges from 1.11% to 2.15% below the 5% level,

with an average of 1.98% in Panel A and 1.60% in Panel B. This lower effective level will result

in substantially lower power of tests using the BMN expression.

4.5 Summary

In summary, the simulation results in Table 1 are consistent with the analysis in Section

3. When the variance of the numerator of the standardized difference statistic is computed using

the interval probabilities that are known in the simulation, the empirical distribution of the

standardized difference statistic is consistent with the theoretical distribution derived in Section

3. When the variance is computed using estimated, rather than known, interval probabilities the

distribution continues to be consistent with the standard normal distribution for intervals with an

expected sample size greater than 25. Even for the interval sample sizes in Panel B less than 25

where the normal approximation is more problematic, departures from the assumed standard

normal remain relatively minor. Finally, standardized differences relying on the simplifying BD

independence assumption to compute the denominator variance have standard deviations about

1% larger than standardized differences using the correct variance expression, resulting in

rejection rates no more than a fraction of one percent higher than rates using the correct variance

expression.

5. Power of the Standardized Difference Test for Discontinuities

Earnings management to meet a benchmark transforms pre-managed observations in

intervals below the benchmark into reported (post-managed) earnings above the benchmark.

When the unobservable distribution of pre-managed observations is smooth, earnings

Page 21

management creates a trough below the benchmark and a peak above the benchmark in the post-

managed distribution. Standardized difference statistics are designed to be sensitive to

departures from smoothness due to earnings management.

The power of standardized difference tests to detect earnings management is determined

by 1) the pre-managed earnings distribution in the vicinity of the benchmark, and 2) the specific

way that earnings are managed.

5.1 Pre-managed earnings distribution

The power of tests for a discontinuity at the benchmark depends on the probabilities that

a pre-managed earnings observation will fall in the two intervals immediately below the

benchmark, referred to here as intervals –2 and –1, and in the two intervals immediately above

the benchmark, referred to here as intervals +1 and +2.29 These probabilities depend on where

the benchmark falls relative to the distribution, i.e., how far the benchmark is in the tail of the

distribution, and which side of the distribution the benchmark is on.

Tests for management with respect to benchmarks further in the tail will have lower

power because there are fewer pre-managed observations below the benchmark to be managed.

Power is determined by a combination of the total number of observations in the earnings

distribution, N, and the probability of pre-managed observations in the interval(s) below the

benchmark that could be managed to meet the benchmark. In the following analysis,, we

consider the specific case where only observations in the interval immediately below the

benchmark are managed to meet the benchmark, so power is determined by N and the probability

of pre-managed earnings observations in the interval immediately below the benchmark, referred

29 Note that in section 4 where we were considering the null hypothesis, intervals were numbered relative to the mean of the distribution, where –1 and +1 were the intervals immediately below and above the mean. In this section, intervals are numbered relative to the benchmark, where –1 and +1 are the intervals immediately below and above the benchmark.

Page 22

to as the interval sample size for interval –1. Denoting the cumulative probability that a pre-

managed earnings observation falls in interval –1 as p-1, the interval sample size for interval –1 is

n–1 = Np–1.30 In the example power calculations below, p–1 is set at .01 and power is evaluated

for N ranging from 1,000 to 128,000, so the interval sample sizes range from 10 to 1,280.

Results in the N=1,000 and N=2,000 columns where interval sample sizes are 10 and 20,

respectively, should be interpreted cautiously in light of evidence from the statistics literature

that the normal provides a reasonable approximation to the multinomial for expected sample

sizes of 25 or greater.

Power also depends on which side of the distribution the benchmark falls. For example,

when the benchmark falls below the mode of a typical symmetric earnings distribution, p-2<p-

1<p+1<p+2. On the other hand, when the benchmark falls above the mode, the inequalities are all

reversed. Power will also depend on the relative values for probabilities in adjacent intervals

(e.g., how much larger is p+2 than p+1, or how much larger is p+1 than p–1) which in turn depends

on the specific earnings distribution and on the location of the benchmark relative to the

distribution. Because the inequalities relating the probabilities in the four intervals could go

either direction and because there is no general relationship among the relative values of the

probabilities, we adopt the simplifying assumption that the pre-managed probabilities are all

equal to the probability for interval –1, so that p-2 = p-1 = p+1 = p+2 = .01.

30 The location of the benchmark relative to the peak of the pre-managed earnings distribution can also affect the interpretation of the standardized difference test. This issue is relatively unimportant for well-behaved theoretical earnings distributions such as the normal distribution used in the simulation. (See the results in Table 1 for the intervals immediately adjacent to 0.) However, because empirical distributions often are far more leptokurtic than the normal distribution, sometimes modeled as mixtures of normal distributions or as symmetric stable distributions, empirical evidence of a significant standardized difference at the peak of the distribution should generally not be interpreted as evidence of earnings management.

Page 23

5.2 Assumptions about how earnings are managed

The power of the standardized difference test also depends on how earnings are managed.

In this section, we illustrate power calculations for one assumption. To the extent this

assumption is a reasonable representation of how earnings are managed, these calculations are

interesting in their own right. However, the calculations also serve to illustrate how power can

be calculated for any assumption about how earnings are managed. We consider the specific

assumption that earnings management effects are concentrated entirely in the interval

immediately below and the interval immediately above the benchmark, i.e., earnings in interval

–1 are managed upward so that reported (post-managed) earnings is in interval +1.

5.2.1 Rate of earnings management (r)

We assume the probability of earnings management is r, representing the rate of earnings

management.31 Because earnings are managed from interval – to interval +1, the expected

number of reported earnings observations in interval –1 is less than the average of the numbers

of observations in the adjacent intervals, i.e., intervals –2 and +1. Thus, the left standardized

difference, the standardized difference for the first interval left of the benchmark has a negative

expectation. Similarly, the expected number of observations in interval +1 is greater than the

average of the adjacent intervals, i.e., intervals –1 and +2 and therefore the expectation of the

right standardized difference, the standardized difference for the first interval above the

benchmark, is positive.32 Other patterns of earnings management can be described in terms of

more complex concentration effects, considered next.

31 For example, BD estimate that the rate of earnings management among firms with small pre-managed losses is in the range of 30-44% and the rate among firms with small pre-managed earnings decreases is 8-12%. 32 For this simple form of earnings management, the left and right standardized difference are alternative, but highly-correlated, tests for the effect of earnings management. For more complex forms of earnings management, either the left or right standardized differences may provide a more appropriate test, as discussed in more detail later.

Page 24

5.2.2 Concentration

Concentration is determined by both how earnings are managed and by the choice of

interval width in the research design. The results below analyze the research design choice of

interval width, holding constant the assumption about how earnings are managemed.

Specifically, we consider the effect on power 1) when interval width is cut in half so that

earnings are being managed from two intervals below the benchmark and managed to two

intervals above the benchmark, and 2) when interval width is doubled so that the first interval

below the benchmark includes both the original set of observations where earnings are being

managed and a second set of observations where earnings are not being managed.

More complex models might describe which pre-managed earnings observations are

subject to earnings management, and how the post-managed earnings observations are

distributed. For example, observations might be managed from more than one interval below the

benchmark, rather than only the single interval immediately below the benchmark, where the rate

of management likely decline for intervals further below the benchmark. Further, earnings might

be managed to more than one interval above the benchmark. Evaluation of these possibilities is

left for future research.

5.3 Analysis of the Power of the standardized difference test

The mean and variance of the standardized difference statistic given in equations (2) and

(4), respectively, can be used to compute power under any specified alternative hypothesis.

Power, denoted by 1–β, is a function of the level of the test, denoted by α.33 An observed

33 The logic for rejection of the null when a significant statistic is observed is that a significant statistic is more likely to occur under the alternative (with probability 1-β) than under the null (with probability α). Thus, the strength of the evidence that a significant result provides in favor of the alternative is described by the ratio (1–β)/α. As the ratio approaches one (i.e., as power falls close to the level of the test), the effect of a significant test statistic on belief in null versus the alternative disappears. For further discussion, see, for example, Burgstahler (1987).

Page 25

standardized difference statistic is significant at level α when the probability under the null

hypothesis of a statistic as extreme as the observed value is less than or equal to the size of the

test, α. The power of the standardized difference test (the probability of rejecting a false null

hypothesis) can be assessed for various combinations of the interval sample size (Np–1), the rate

of earning management (r), and concentration. To explore the effect of concentration, we begin

with the simple case where earnings are managed from only the interval immediately left of the

benchmark and earnings are managed to only the interval immediately right of the benchmark,

i.e., where earnings management is concentrated entirely in the two intervals adjacent to the

benchmark (Section 5.4.1). Then we explore the effects of smaller (Section 5.4.2) or larger

(Section 5.4.3) interval widths.

5.3.1 Concentration in the two intervals immediately adjacent to benchmark

Table 2 evaluates the power of the standardized difference test statistic for various rates

of earnings management, shown in the rows of the table and for overall sample sizes ranging

from N=1,000, a sample size much lower than is typical in applications, to N=128,000, which is

50% to 100% greater than is typical in applications. In combination with the assumption that

=.01, these sample sizes translate into interval sample sizes ranging from Np–1=10 up to Np–

1=1,280. Panel A shows results for left standardized differences and Panel B for right

standardized differences.

To illustrate how the values in Table 2 are calculated, consider the example of N=64,000

and r= .10. The assumptions above imply that the probabilities of post-managed observations in

intervals –2 and +2 are the same as the pre-managed probabilities, p–2 = p+2 = .01 and the effect

of an earnings management rate of 10% is to reduce p-1 to .009, and increase p+1 to .011. Table 2

shows that for an α=.05 test, the power of the left standardized difference under these

Page 26

assumptions is .938 and the power of the right standardized difference is .916.34 The

assumptions in this example correspond to an expectation of 640 pre-managed observations in

the interval left of the benchmark, with just 640 x .1 = 64 observations expected to be managed

to exceed the benchmark. Thus, the results show that standardized difference tests have power in

excess of 90% to detect management for just 64 observations in a distribution of 64,000, or just

.1% of the total number of observations.

As another example, for N=16,000, an earnings management rate of just .20 yields power

in excess of 90%. Thus, .20 x 160 = 32 observations out of 16,000 observations managed to

exceed the benchmark or just .2% are sufficient for the standardized difference tests to have

power greater than 90%. As a final example, Panel A shows that the left standardized difference

is expected to be significant more than 90% of the time when just 8,000 x .01 x .300 = 24

observations are managed to meet the benchmark in a sample of 8,000 observations.

The shading in Panels A and B demarcates different strata of power. Cells where power

is 1.000 are shaded dark gray,35 cells with power greater than .900 are medium gray, and cells

with power greater than .500 are light gray. Overall, the results show that when the effects of

earnings management are concentrated entirely in the interval below and above the benchmark

and when the intervals in the vicinity of the benchmark contain about 1% of the distribution, the

standardized difference tests have power greater than .900 to detect small rates and small

numbers of observations managed to meet the benchmark.

34 These values are computed by using equations (2) and (4) to compute the mean and standard deviation, respectively, of the left-standardized difference given the values p-2 , p-1 , and p+1 and of the right-standardized difference given the values p-1 , p+1 , and p+2. Then, the mean and standard deviation of each standardized difference distribution are used to find the probability of a left-standardized difference less than the critical value of –1.64485 and the probability of a right-standardized difference greater than 1.64485. 35 More precisely, the power in these cells is greater than .9995, so the values round to 1.000 with three decimal places of accuracy.

Page 27

Of course, in applications it is unlikely that interval widths can be identified such that the

effects of earnings management are concentrated entirely in the interval below and above the

benchmark. Instead, it is likely that the rate of earnings management declines for observations

further below the benchmark and that probability that an observation will be managed to exceed

the benchmark by a given amount will decline with the magnitude of the amount. Thus, interval

width is a difficult and important choice in designing tests of earnings management. While it is

not possible to delineate and evaluate all the possible effects of interval width choice for all

possible types of earnings management behavior, we provide results below showing the effect of

choosing an interval width that is too narrow or too wide relative to the case illustrated in Table

2, where earnings management behavior is such that all effects are concentrated in the interval

immediately below and above the benchmark. That is, we maintain the assumptions that

earnings are managed from an interval below the benchmark with pre-managed probability =.01

at some rate r that applies uniformly to the entire interval, and that the managed observations are

distributed uniformly in an interval above the benchmark with pre-managed probability =.01. In

Section 5.4.2, we evaluate the effect on power of using a research design with an interval width

that is only half as wide as the intervals in which earnings are concentrated (so that the

implemented intervals widths have pre-managed probabilities = .005). In Section 5.4.3, we

evaluate the effect on power of using a research design with an interval width that is twice as

wide as the intervals in which earnings are concentrated (so that the implemented intervals

widths have pre-managed probabilities = .02).

5.3.2 Intervals half as wide as the .01 interval in which earnings management is concentrated

Table 3 shows the power of the standardized difference test statistic when the pre-

managed interval width is .005 and the effects of earnings management are concentrated in two

Page 28

intervals below the benchmark and in two intervals above the benchmark. The rates in the rows

of the table represent the rate of earnings management among pre-managed earnings

observations in the first two intervals below the benchmark, interval –2 and –1, where the

cumulative probability of pre-managed observations in these two interval p–2 + p–1, is assumed to

be .01. Managed observations are assumed to move to the first two intervals above the

benchmark, intervals +1 and +2. Panel A shows results for left standardized differences and

Panel B for right standardized differences.

Because the effects of earnings management are now spread across multiple intervals, the

prominence of the trough and peak relative to the surrounding intervals are reduced, and the

power of the standardized difference tests is also reduced. Focusing on the same example we

considered in Section 5.4.1 where N=64,000 and r= .10, the assumptions imply that the

probabilities of post-managed observations in the four intervals are p-2 = p-1 = .0045, and p+1 = p+2

= .0055. In words, earnings management is expected to transform 10% of the pre-managed

observations in intervals –2 and –1 into post-managed observations in intervals +1 and +2, so

that the probabilities of a post-managed observation in intervals –2 and –1 are .0045 (so the total

probability is .009, the same as in interval –1 in Section 5.4.1) and the probabilities of a post-

managed observation in interval +1 or +2 are .0055 (so the total probability in the two intervals

is .011, the same as in interval +1 in Section 5.4.1). Table 3 shows that for an α=.05 test, the

power of the left standardized difference under these assumptions is .447 and the power of the

right standardized difference is .409, computed by using equations (2) and (4) to compute the

mean and standard deviation and finding the probability of a left-standardized difference less

than –1.64485 and the probability of a right-standardized difference greater than 1.64485. Thus,

using interval width that is much narrower than the interval width in which earnings management

is concentrated results in a substantial loss of power for this example.

Page 29

Panels C and D facilitate the comparison of the power results in Table 3 with the

corresponding amounts in Table 2, and shows that the use of a too narrow interval width can

result in a substantial loss of power. Panels C and D show gray shading for the cells where there

cannot be much loss of power when the interval width is too narrow, either because the power is

very high even with the narrow interval width in Table 3 (the dark gray shading corresponds to

cells where the power in Panels A and B exceeds .900) or because the power is very low even

with the optimal interval width in Table 2 (the light gray shading corresponds to cells where the

power in Table 2 is less than .100). The remaining, unshaded cells in Table 3 Panels C and D

show that the power using the too narrow interval width assumed in Table 3 is often on the order

of 40%-60% of the power using the optimal interval width in Table 2. Thus, too narrow

intervals can substantially reduce the power of standardized difference tests.

5.3.3 Intervals twice as wide as the .01 interval in which earnings management is concentrated

Table 4 shows the power of the standardized difference test statistic when the pre-

managed interval width is .02 so that observations affected by earnings management are

combined with observations not affected by earnings management in the interval below the

benchmark and the interval above the benchmark. The rates in the rows of the table represent

twice the rate of earnings management among pre-managed earnings observations in the interval

below the benchmark where the cumulative probability of pre-managed observations in the

interval, p–1, is now assumed to be .02. Managed observations are assumed to move to the first

interval above the benchmark, intervals +1. Panel A shows results for left standardized

differences and Panel B for right standardized differences.

Because intervals affected by earnings management are now combined with intervals not

affected by earnings management, the power of the standardized difference tests is again

Page 30

reduced. Focusing on the same example we have used before where N=64,000 and r= .10, the

assumptions imply that the probabilities of post-managed observations in the four intervals are p-

2 = .02, p-1 = .019, p+1 = .021, and p+2 = .02. In words, earnings management is expected to

transform 10% of half of the pre-managed observations in interval –1 into post-managed

observations in interval +1, so that the probability of a post-managed observation in interval –1

is .019 (i.e., the total probability is .01 + .009, where .009 is the same as in interval –1 in Section

5.4.1) and the probability of a post-managed observation in interval +1 is .021 (i.e., the total

probability is .011 + .01, where .011 is the same as in interval +1 in Section 5.4.1). Table 4

shows that for an α=.05 test, the power of the left standardized difference under these

assumptions is .717 and the power of the right standardized difference is .698. Thus, using

interval width that is much wider than the interval width in which earnings management is

concentrated results in a substantial loss of power.

Panels C and D facilitate the comparison of the power results in Table 4 with the

corresponding amounts in Table 2, and shows that the use of an interval width that is too wide

can result in a substantial loss of power. As in Table 3, Panels C and D show gray shading for

the cells where there cannot be much loss of power either because the power is very high even

with the wide interval width in Table 4 or because the power is very low even with the optimal

interval width in Table 2. The remaining, unshaded cells in Table 4 Panels C and D show that

the power using the too narrow interval width assumed in Table 3 is often on the order of 50%-

70% of the power using the optimal interval width in Table 2. Thus, too wide intervals can

substantially reduce the power of standardized difference tests.

Page 31

6. Conclusion

A large body of literature documents discontinuities in earnings distributions at

prominent earnings benchmarks. These discontinuities have been widely interpreted as

consistent with the theory that managers take actions to ensure that earnings meet benchmarks.

This interpretation is supported by survey evidence in Graham, Harvey, and Rajgopal (2005)

indicating that managers are willing to incur real costs in order to meet such benchmarks.

However, in situations where both discontinuity evidence and accruals-based models have been

applied, abnormal accruals models have failed to find evidence that earnings management using

accruals explain discontinuities.

This paper reviews and refines the derivation of the distribution of the Burgstahler and

Dichev (1997) standardized difference statistic and corrects an error introduced into the literature

in Beaver, McNichols, and Nelson (2007). The paper also provides a formal evaluation of the

statistical properties of discontinuity tests to detect earnings management. The analysis shows

the importance of defining interval widths that result in concentration of the effects of earnings

management in the interval immediately below and the interval immediately above the

benchmark – much narrower or much wider interval widths can substantially reduce the power of

standardized difference tests.

The results show that standardized difference tests have the power to detect management

of relatively small amounts of earnings (say .25%–.50% of MVE) by a small proportion of firms

(say .1%–.2%). In contrast, previous research suggests that accruals-based tests have reasonable

power only for far larger amounts of earnings management by a much larger proportion of

sample firms (say 5% or more of total assets for all sample firms). Thus, together with evidence

from previous papers, the evidence suggests that discontinuity tests have far greater power to

Page 32

detect smaller amounts and lesser rates of earnings management than do tests based on abnormal

accruals.

Page 33

References

Ayers, B., J. Jiang, and P.E. Yeung. "Discretionary accruals and earnings management: An analysis of pseudo earnings targets." The Accounting Review 81 (October 2006): 617-652.

Ball, R. “Accounting Informats Investors and Earnings Management is Rife: Two Questionable Beliefts.” Unpublished Working Paper, University of Chcago, SSRN abstract=2211288 (2013).

Beaver, W., M. McNichols, and K. Nelson. “An Alternative Interpretation of the Discontinuity in Earnings Distributions,” Review of Accounting Studies (2007), Vol. 12, No. 4, 525-556.

Bollen, N., and V. Pool. Do hedge fund managers misreport returns? Evidence from the pooled distribution. The Journal of Finance 64:5 (2009), 2257-2288.

Brown, L. "A temporal analysis of earnings surprises: Profits versus losses," Journal of Accounting Research 39 (2001): 221–42.

Brown, L.D., and M.L. Caylor. "A Temporal Analysis of Quarterly Earnings Thresholds: Propensities and Valuation Consequences. The Accounting Review 80 (April 2005): 423-440.

Burgstahler, D. "Inference from Empirical Research." The Accounting Review, Vol. 62, No. 1 (January 1987): 203-214.

Burgstahler, D., and E. Chuk. “What Have We Learned About Earnings Management? Correcting Disinformation About Discontinuities.” Unpublished Working Paper, University of Washington (2013).

Burgstahler, D., and I. Dichev. “Earnings Management to Avoid Earnings Decreases and Losses.” Journal of Accounting & Economics 24 (1997): 99–126.

Burgstahler, D., and M. Eames. "Management of Earnings and Analysts’ Forecasts to Achieve Zero and Small Positive Earnings Surprise." Journal of Business Finance and Accounting. Vol. 33, No.5-6, (June/July 2006): 633-652.

Carslaw, C. “Anomalies in Income Numbers: Evidence of Goal Oriented Behavior.” The Ac- counting Review 63 (1988): 321–7.

Collins, D., and P. Hribar. , 2002. Errors in estimating accruals: implications for empirical research. Journal of Accounting Research 40, 105–135.

Das, S., and H. Zhang. "Rounding-up in reported EPS, behavioral thresholds, and earnings management." Journal of Accounting and Economics 35 (2003), 31-50.

Daske, H., G. Gebhardt, and S. McLeay. "The distribution of earnings relative to targets in the European Union." Accounting and Business Research 36:3 (2006), 137-167.

Page 34

Dechow, P., A. Hutton, J. Kim, and R. Sloan. “Detecting Earnings Management: A New Approach.” Unpublished working paper, University of California, Berkeley, October 11, 2010.

Dechow, P., S. Richardson, and I. Tuna. “Why Are Earnings Kinky? An Examination of the Earnings Management Explanation,” Review of Accounting Studies (2003), Vol. 8, 355-384.

Dechow, P., R. Sloan and A. Sweeney, 1995, Detecting earnings management, The Accounting Review 70; 2: 193-225.

Degeorge, F., Patel, J., Zeckhauser, R. "Earnings manipulations to exceed thresholds." Journal of Business 72 (1999): 1–33.

Dichev, I., Skinner, D., 2002. Large sample evidence on the debt covenant hypothesis. Journal of Accounting Research 40, 1091–1123.

Donelson, D., McInnis, J., & Mergenthaler, R. "Discontinuities and earnings management: Evidence from restatements related to securities litigation." Contemporary Accounting Research 30:1 (2013): 242-268.

Durtschi, C., and P. Easton. “Earnings Management? The Shapes of the Frequency Distributions of Earnings Metrics Are Not Evidence ipso facto.” Journal of Accounting Research 43 (2005): 557–92.

Durtschi, C., and P. Easton. “Earnings Management? Erroneous Inferences Based on Earnings Frequency Distributions.” Journal of Accounting Research 47 (2009): 1249–1281.

Dyreng, S., Mayew, W., Schipper, K., 2012. Evidence that managers intervene in financial reporting to avoid working capital deficits. Working paper.

Ecker, F., J. Francis, P. Olsson, and K. Schipper. “Peer Firm Selection for Discretionary Accruals Models.” Unpublished working paper, Duke University, March 2011.

Graham, J., C. Harvey and S. Rajgopal. "The Economic Implications of Corporate Financial Reporting", Journal of Accounting and Economics 40 (2005): 3-73.

Grundfest, J. and N. Malenko. "Quadrophobia: Strategic Rounding of EPS Data." Stanford University Working paper, October 14, 2009.

Hayn, C., "The information content of losses." Journal of Accounting and Economics 20 [2] (1995): 125-153.

Healy, P., and J. Whalen. “A Review of the Earnings Management Literature and Its Implications for Standard Setting.” Accounting Horizons 13 (1999): 365–383.

Jensen, M. "Corporate Budgeting is Broken--Let’s Fix It." Harvard Business Review (November 2001).

Page 35

Jensen, M. "Paying People to Lie: the Truth about the Budgeting Process." European Financial Management, Vol. 9, No. 3, 2003.

Johnson and Kotz (1969) Discrete Distributions, John Wiley & Sons : New York.

Kerstein, J., and A. Rai. "Intra-year shifts in the earnings distribution and their implications for earnings management"Journal of Accounting & Economics, Vol. 44 (2007): 399–419.

Kothari, S.P., A. Leone, and C. Wasley, 2005, Performance matched discretionary accruals measures, Journal of Accounting and Economics 39: 163-197.

Matsumoto, D. "Management's Incentives to Avoid Negative Earnings Surprises." The Accounting Review, Vol. 77, No. 3 (July 2002), pp. 483-514.

McNichols, M. "Research design issues in earnings management studies," Journal of Accounting and Public Policy 19 (2000), pp. 313–45.

Owens, E., J.S.Wu, and J. Zimmerman. “Business Model Shocks and Abnormal Accrual Models,” University of Rochester working paper, December 9, 2013.

Schipper, K. “Commentary on Earnings Management.” Accounting Horizons 3 (1989): 91–102.

Stubben, S. "Discretionary Revenues as a Measure of Earnings Management." The Accounting Review, Vol. 85, No. 2 (March 2010), pp. 695-717.

Thomas, J. "Unusual patterns in reported earnings." Accounting Review Vol. 64 No. 4 (1989): 773-787.

Page 36

-50 -26 -25 -1 +1 +25 +26 +50

Figure 1Intervals in Simulation

Table  1  Distribution  of  Standardized  Difference  Statistics  

1,000,000  Simulation  Trials  Observations  Generated  From  Standard  Normal  Distribution  

For  100  intervals  of  width  .04  between  -­‐2  and  +2  standard  deviations  of  mean    Panel  A:  Sample  Size  generated  for  each  simulation  trial  =  64,000  

Std. Diff. with Variance Based on

Known Probabilities

Std. Diff. with Variance Based on

Estimated Probabilities

Std. Diff. Using BD

Std. Diff. Using BMN

Variance Effective Level of 5%

Significance Test

Interval

Interval Expected

n

Theoretical Expectation

of Std. Diff. Mean

Standard Dev. Mean

Standard Dev.

Standard Dev.

Standard Dev.

Based on Estimated

Probs. Using BD

Using BMN

Variance (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

-50 143.9 -0.023 -0.021 0.9980 -0.038 1.000 1.001 0.865 4.4% 4.4% 2.6% -49 155.6 -0.022 -0.023 0.9998 -0.040 1.002 1.003 0.867 4.4% 4.4% 2.6% -48 168.0 -0.022 -0.021 0.9981 -0.036 1.001 1.001 0.866 4.4% 4.4% 2.6% -47 181.1 -0.022 -0.023 0.9986 -0.038 1.001 1.002 0.866 4.5% 4.5% 2.7% -46 195.0 -0.021 -0.020 0.9924 -0.035 0.995 0.996 0.861 4.3% 4.3% 2.6% -45 209.5 -0.020 -0.021 1.0016 -0.035 1.004 1.005 0.869 4.5% 4.5% 2.7% -44 224.8 -0.020 -0.019 0.9998 -0.033 1.002 1.003 0.867 4.5% 4.5% 2.7% -43 240.8 -0.019 -0.020 0.9993 -0.033 1.001 1.003 0.867 4.6% 4.6% 2.7% -42 257.5 -0.018 -0.019 0.9993 -0.032 1.001 1.003 0.867 4.5% 4.6% 2.7% -41 275.0 -0.018 -0.016 0.9992 -0.029 1.001 1.003 0.867 4.6% 4.6% 2.7% -40 293.2 -0.017 -0.018 0.9979 -0.029 1.000 1.001 0.866 4.6% 4.6% 2.7% -39 312.0 -0.016 -0.015 0.9993 -0.027 1.001 1.003 0.867 4.7% 4.7% 2.8% -38 331.6 -0.015 -0.015 0.9994 -0.026 1.002 1.003 0.868 4.6% 4.6% 2.7% -37 351.8 -0.014 -0.015 0.9966 -0.025 0.999 1.001 0.865 4.6% 4.6% 2.7% -36 372.7 -0.013 -0.013 1.0026 -0.023 1.005 1.007 0.871 4.7% 4.8% 2.8%

-35 394.1 -0.012 -0.010 1.0046 -0.021 1.007 1.009 0.873 4.8% 4.8% 2.9% -34 416.2 -0.011 -0.012 0.9996 -0.022 1.002 1.005 0.869 4.7% 4.7% 2.8% -33 438.7 -0.009 -0.009 1.0050 -0.019 1.008 1.010 0.873 4.9% 4.9% 2.9% -32 461.8 -0.008 -0.008 1.0036 -0.018 1.006 1.009 0.872 4.8% 4.9% 2.8% -31 485.2 -0.007 -0.007 0.9991 -0.016 1.002 1.005 0.869 4.7% 4.8% 2.8% -30 509.1 -0.006 -0.007 0.9998 -0.016 1.003 1.005 0.869 4.8% 4.8% 2.8% -29 533.3 -0.005 -0.003 1.0007 -0.012 1.004 1.007 0.870 4.8% 4.9% 2.8% -28 557.7 -0.003 -0.004 1.0036 -0.013 1.007 1.010 0.873 4.9% 5.0% 3.0% -27 582.3 -0.002 -0.002 1.0045 -0.010 1.008 1.011 0.874 5.0% 5.0% 3.0% -26 607.1 -0.001 -0.002 1.0034 -0.010 1.007 1.010 0.873 5.0% 5.0% 3.0% -25 631.8 0.001 0.002 0.9998 -0.006 1.003 1.007 0.870 4.9% 5.0% 2.9% -24 656.6 0.002 0.001 0.9975 -0.007 1.001 1.004 0.868 4.9% 4.9% 2.9% -23 681.2 0.003 0.005 0.9963 -0.003 1.000 1.004 0.867 4.9% 5.0% 2.9% -22 705.6 0.005 0.003 0.9980 -0.005 1.002 1.006 0.869 4.9% 5.0% 2.9% -21 729.7 0.006 0.006 0.9984 -0.002 1.003 1.006 0.870 4.9% 5.0% 2.9% -20 753.4 0.007 0.008 0.9965 0.000 1.001 1.005 0.868 4.9% 5.0% 2.9% -19 776.7 0.008 0.008 1.0000 0.000 1.004 1.008 0.871 5.0% 5.1% 3.0% -18 799.3 0.009 0.010 0.9996 0.003 1.004 1.008 0.871 5.0% 5.1% 3.0% -17 821.4 0.011 0.010 1.0026 0.002 1.007 1.011 0.874 5.1% 5.1% 3.0% -16 842.7 0.012 0.014 1.0048 0.007 1.009 1.014 0.876 5.2% 5.3% 3.1% -15 863.1 0.013 0.010 1.0043 0.003 1.009 1.014 0.876 5.1% 5.2% 3.1% -14 882.7 0.014 0.016 1.0077 0.009 1.012 1.017 0.879 5.2% 5.2% 3.1% -13 901.2 0.015 0.013 1.0063 0.006 1.011 1.016 0.878 5.2% 5.3% 3.1% -12 918.7 0.016 0.017 0.9947 0.010 1.000 1.004 0.868 5.0% 5.1% 3.0% -11 935.0 0.016 0.015 0.9954 0.008 1.000 1.005 0.868 5.0% 5.1% 3.0% -10 950.1 0.017 0.019 0.9961 0.013 1.001 1.006 0.869 5.1% 5.1% 3.0%

-9 963.9 0.018 0.016 0.9994 0.010 1.004 1.010 0.872 5.1% 5.2% 3.0% -8 976.3 0.019 0.020 0.9997 0.013 1.005 1.010 0.872 5.1% 5.2% 3.1% -7 987.3 0.019 0.019 0.9956 0.012 1.001 1.006 0.869 5.1% 5.2% 3.0% -6 996.8 0.020 0.020 0.9968 0.014 1.002 1.007 0.870 5.2% 5.3% 3.1% -5 1004.8 0.020 0.020 0.9959 0.014 1.001 1.006 0.869 5.1% 5.2% 3.0%

-4 1011.3 0.020 0.021 0.9952 0.014 1.001 1.006 0.869 5.0% 5.1% 3.0% -3 1016.1 0.021 0.020 0.9978 0.014 1.003 1.008 0.871 5.2% 5.3% 3.1% -2 1019.4 0.021 0.021 0.9991 0.014 1.005 1.010 0.872 5.1% 5.2% 3.0% -1 1021.0 0.021 0.021 1.0006 0.015 1.006 1.011 0.874 5.2% 5.3% 3.1% 1 1021.0 0.021 0.021 0.9995 0.014 1.005 1.010 0.873 5.1% 5.2% 3.0% 2 1019.4 0.021 0.021 1.0013 0.014 1.007 1.012 0.874 5.3% 5.4% 3.2% 3 1016.1 0.021 0.020 1.0065 0.014 1.012 1.017 0.879 5.3% 5.4% 3.2% 4 1011.3 0.020 0.021 1.0003 0.015 1.006 1.011 0.873 5.2% 5.3% 3.1% 5 1004.8 0.020 0.020 0.9991 0.013 1.004 1.010 0.872 5.1% 5.2% 3.0% 6 996.8 0.020 0.019 0.9966 0.013 1.002 1.007 0.870 5.0% 5.1% 3.0% 7 987.3 0.019 0.020 1.0001 0.013 1.005 1.011 0.873 5.1% 5.2% 3.1% 8 976.3 0.019 0.018 0.9987 0.012 1.004 1.009 0.872 5.1% 5.2% 3.0% 9 963.9 0.018 0.018 0.9988 0.011 1.004 1.009 0.872 5.1% 5.2% 3.0%

10 950.1 0.017 0.017 0.9963 0.011 1.001 1.006 0.869 5.1% 5.2% 3.0% 11 935.0 0.016 0.015 0.9964 0.009 1.001 1.006 0.869 5.0% 5.1% 3.0% 12 918.7 0.016 0.017 0.9973 0.010 1.002 1.007 0.870 5.1% 5.2% 3.1% 13 901.2 0.015 0.014 1.0070 0.008 1.012 1.017 0.878 5.3% 5.3% 3.2% 14 882.7 0.014 0.014 1.0038 0.007 1.009 1.013 0.875 5.0% 5.1% 3.0% 15 863.1 0.013 0.013 1.0013 0.006 1.006 1.010 0.873 5.1% 5.2% 3.0% 16 842.7 0.012 0.010 1.0041 0.003 1.009 1.013 0.875 5.1% 5.2% 3.0% 17 821.4 0.011 0.012 0.9984 0.005 1.003 1.007 0.870 4.9% 5.0% 2.9% 18 799.3 0.009 0.008 0.9986 0.001 1.003 1.007 0.870 5.0% 5.1% 3.0% 19 776.7 0.008 0.009 1.0040 0.001 1.008 1.012 0.875 5.1% 5.2% 3.0% 20 753.4 0.007 0.007 1.0005 -0.001 1.005 1.009 0.872 4.9% 5.0% 3.0% 21 729.7 0.006 0.006 0.9988 -0.002 1.003 1.007 0.870 5.0% 5.1% 3.0% 22 705.6 0.005 0.005 1.0028 -0.002 1.007 1.010 0.873 5.0% 5.1% 3.0% 23 681.2 0.003 0.003 1.0022 -0.005 1.006 1.010 0.873 5.0% 5.0% 3.0% 24 656.6 0.002 0.001 0.9982 -0.006 1.002 1.005 0.869 4.8% 4.9% 2.9% 25 631.8 0.001 0.002 0.9947 -0.007 0.998 1.002 0.866 4.8% 4.9% 2.8% 26 607.1 -0.001 -0.001 0.9984 -0.009 1.002 1.005 0.869 4.8% 4.8% 2.8% 27 582.3 -0.002 -0.003 1.0005 -0.012 1.004 1.007 0.870 4.8% 4.9% 2.9%

28 557.7 -0.003 -0.003 0.9989 -0.011 1.002 1.005 0.869 4.8% 4.9% 2.9% 29 533.3 -0.005 -0.004 0.9946 -0.012 0.998 1.000 0.865 4.7% 4.7% 2.7% 30 509.1 -0.006 -0.006 0.9976 -0.015 1.001 1.003 0.867 4.7% 4.8% 2.8% 31 485.2 -0.007 -0.008 0.9992 -0.017 1.002 1.005 0.869 4.7% 4.8% 2.8% 32 461.8 -0.008 -0.008 1.0025 -0.017 1.005 1.008 0.871 4.8% 4.9% 2.9% 33 438.7 -0.009 -0.010 1.0050 -0.019 1.008 1.010 0.873 4.8% 4.8% 2.8% 34 416.2 -0.011 -0.010 1.0082 -0.021 1.011 1.013 0.876 4.9% 4.9% 2.9% 35 394.1 -0.012 -0.013 1.0028 -0.023 1.005 1.007 0.871 4.8% 4.8% 2.9% 36 372.7 -0.013 -0.011 0.9981 -0.022 1.000 1.002 0.867 4.7% 4.7% 2.8% 37 351.8 -0.014 -0.015 1.0041 -0.026 1.007 1.009 0.872 4.7% 4.7% 2.7% 38 331.6 -0.015 -0.014 1.0037 -0.026 1.006 1.008 0.872 4.7% 4.7% 2.8% 39 312.0 -0.016 -0.015 0.9972 -0.027 1.000 1.001 0.866 4.5% 4.6% 2.7% 40 293.2 -0.017 -0.017 0.9962 -0.029 0.999 1.000 0.865 4.5% 4.5% 2.6% 41 275.0 -0.018 -0.018 0.9961 -0.030 0.998 1.000 0.865 4.5% 4.5% 2.6% 42 257.5 -0.018 -0.018 0.9997 -0.031 1.001 1.003 0.867 4.6% 4.7% 2.7% 43 240.8 -0.019 -0.018 0.9997 -0.032 1.002 1.003 0.867 4.6% 4.6% 2.7% 44 224.8 -0.020 -0.021 1.0012 -0.034 1.004 1.005 0.869 4.5% 4.5% 2.7% 45 209.5 -0.020 -0.021 1.0017 -0.035 1.004 1.005 0.869 4.5% 4.6% 2.7% 46 195.0 -0.021 -0.021 1.0035 -0.035 1.005 1.006 0.870 4.6% 4.6% 2.8% 47 181.1 -0.022 -0.021 1.0025 -0.037 1.004 1.005 0.869 4.6% 4.6% 2.8% 48 168.0 -0.022 -0.022 1.0018 -0.038 1.004 1.005 0.869 4.5% 4.5% 2.7% 49 155.6 -0.022 -0.023 0.9996 -0.039 1.002 1.003 0.867 4.4% 4.4% 2.6% 50 143.9 -0.023 -0.024 1.0014 -0.041 1.003 1.004 0.868 4.5% 4.5% 2.7%

       

Table  1  (continued)    

Panel  B:  Sample  Size  generated  for  each  simulation  trial  =  4,000    

Std. Diff. with Variance Based on

Known Probabilities

Std. Diff. with Variance Based on

Estimated Probabilities

Std. Diff. Using BD

Std. Diff. Using BMN

Variance Effective Level of 5%

Significance Test

Interval

Interval Expected

n

Theoretical Expectation

of Std. Diff. Mean

Standard Dev. Mean

Standard Dev.

Standard Dev.

Standard Dev.

Based on Estimated

Probs. Using BD

Using BMN

Variance (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

-50 9.0 -0.006 -0.007 0.9996 -0.078 1.018 1.019 0.868 3.6% 3.6% 2.5% -49 9.7 -0.006 -0.005 0.9991 -0.073 1.016 1.017 0.868 3.6% 3.6% 2.5% -48 10.5 -0.006 -0.006 0.9989 -0.071 1.014 1.015 0.867 3.7% 3.7% 2.4% -47 11.3 -0.005 -0.004 0.9987 -0.067 1.013 1.014 0.867 3.7% 3.7% 2.4% -46 12.2 -0.005 -0.006 0.9971 -0.066 1.011 1.012 0.866 3.8% 3.8% 2.4% -45 13.1 -0.005 -0.005 0.9987 -0.063 1.012 1.013 0.868 3.8% 3.8% 2.5% -44 14.0 -0.005 -0.005 0.9993 -0.061 1.012 1.013 0.868 3.9% 3.9% 2.5% -43 15.0 -0.005 -0.005 1.0007 -0.059 1.012 1.014 0.869 4.0% 4.0% 2.5% -42 16.1 -0.005 -0.004 0.9993 -0.056 1.011 1.012 0.868 4.0% 4.0% 2.5% -41 17.2 -0.004 -0.004 0.9998 -0.054 1.011 1.012 0.869 4.0% 4.0% 2.5% -40 18.3 -0.004 -0.005 1.0011 -0.054 1.012 1.013 0.870 4.1% 4.1% 2.5% -39 19.5 -0.004 -0.004 0.9992 -0.050 1.009 1.010 0.868 4.1% 4.1% 2.5% -38 20.7 -0.004 -0.003 0.9998 -0.049 1.009 1.011 0.869 4.1% 4.1% 2.6% -37 22.0 -0.003 -0.004 1.0017 -0.048 1.011 1.013 0.871 4.2% 4.2% 2.6% -36 23.3 -0.003 -0.003 1.0012 -0.046 1.010 1.012 0.870 4.2% 4.2% 2.6% -35 24.6 -0.003 -0.003 1.0014 -0.045 1.010 1.012 0.871 4.2% 4.2% 2.6% -34 26.0 -0.003 -0.002 1.0018 -0.043 1.010 1.012 0.871 4.2% 4.2% 2.6% -33 27.4 -0.002 -0.001 1.0015 -0.041 1.010 1.012 0.871 4.2% 4.2% 2.7% -32 28.9 -0.002 -0.004 1.0009 -0.043 1.009 1.011 0.871 4.2% 4.2% 2.6%

-31 30.3 -0.002 -0.001 1.0004 -0.038 1.008 1.011 0.870 4.2% 4.3% 2.7% -30 31.8 -0.001 -0.001 1.0003 -0.038 1.008 1.010 0.870 4.2% 4.3% 2.7% -29 33.3 -0.001 -0.001 1.0008 -0.036 1.009 1.011 0.871 4.3% 4.3% 2.7% -28 34.9 -0.001 -0.002 1.0017 -0.037 1.009 1.012 0.872 4.3% 4.4% 2.7% -27 36.4 0.000 0.000 1.0011 -0.034 1.009 1.012 0.872 4.4% 4.4% 2.7% -26 37.9 0.000 -0.001 1.0005 -0.035 1.008 1.011 0.871 4.3% 4.4% 2.7% -25 39.5 0.000 0.002 0.9988 -0.031 1.006 1.009 0.870 4.4% 4.5% 2.7% -24 41.0 0.000 -0.001 0.9982 -0.033 1.006 1.009 0.869 4.3% 4.4% 2.7% -23 42.6 0.001 0.002 0.9993 -0.030 1.006 1.010 0.870 4.4% 4.5% 2.7% -22 44.1 0.001 0.000 1.0005 -0.031 1.008 1.011 0.872 4.4% 4.5% 2.8% -21 45.6 0.001 0.003 1.0002 -0.028 1.007 1.011 0.871 4.5% 4.6% 2.8% -20 47.1 0.002 0.001 0.9985 -0.029 1.006 1.010 0.870 4.4% 4.5% 2.8% -19 48.5 0.002 0.002 0.9996 -0.028 1.007 1.011 0.871 4.5% 4.5% 2.8% -18 50.0 0.002 0.004 1.0026 -0.026 1.010 1.014 0.874 4.5% 4.6% 2.8% -17 51.3 0.003 0.001 1.0012 -0.027 1.009 1.013 0.873 4.5% 4.6% 2.8% -16 52.7 0.003 0.004 1.0011 -0.025 1.008 1.013 0.873 4.5% 4.6% 2.8% -15 53.9 0.003 0.003 1.0002 -0.025 1.007 1.012 0.872 4.5% 4.6% 2.8% -14 55.2 0.003 0.003 1.0018 -0.025 1.009 1.014 0.874 4.5% 4.6% 2.8% -13 56.3 0.004 0.004 1.0014 -0.024 1.009 1.014 0.874 4.6% 4.6% 2.8% -12 57.4 0.004 0.005 0.9999 -0.022 1.007 1.012 0.873 4.5% 4.6% 2.8% -11 58.4 0.004 0.003 0.9989 -0.024 1.007 1.011 0.872 4.6% 4.6% 2.8% -10 59.4 0.004 0.004 0.9985 -0.022 1.006 1.011 0.872 4.6% 4.6% 2.8%

-9 60.2 0.004 0.005 0.9982 -0.021 1.006 1.011 0.871 4.5% 4.6% 2.8% -8 61.0 0.005 0.003 0.9997 -0.023 1.008 1.013 0.873 4.6% 4.7% 2.9% -7 61.7 0.005 0.006 0.9999 -0.020 1.008 1.013 0.873 4.6% 4.7% 2.9% -6 62.3 0.005 0.004 0.9990 -0.022 1.007 1.012 0.872 4.6% 4.7% 2.8% -5 62.8 0.005 0.006 1.0009 -0.020 1.009 1.014 0.874 4.6% 4.7% 2.8% -4 63.2 0.005 0.003 1.0007 -0.023 1.009 1.014 0.874 4.6% 4.7% 2.8% -3 63.5 0.005 0.006 0.9985 -0.020 1.006 1.011 0.872 4.6% 4.7% 2.9% -2 63.7 0.005 0.006 0.9984 -0.019 1.006 1.012 0.872 4.6% 4.7% 2.8% -1 63.8 0.005 0.004 1.0003 -0.022 1.008 1.014 0.874 4.6% 4.7% 2.8%

1 63.8 0.005 0.005 1.0004 -0.021 1.008 1.014 0.874 4.6% 4.7% 2.9% 2 63.7 0.005 0.006 1.0004 -0.020 1.008 1.014 0.874 4.6% 4.7% 2.9% 3 63.5 0.005 0.006 1.0017 -0.020 1.009 1.015 0.875 4.6% 4.7% 2.9% 4 63.2 0.005 0.005 1.0008 -0.021 1.009 1.014 0.874 4.6% 4.7% 2.9% 5 62.8 0.005 0.004 0.9999 -0.022 1.008 1.013 0.873 4.6% 4.7% 2.9% 6 62.3 0.005 0.005 1.0002 -0.021 1.008 1.013 0.873 4.6% 4.7% 2.9% 7 61.7 0.005 0.006 1.0001 -0.021 1.008 1.013 0.873 4.6% 4.7% 2.8% 8 61.0 0.005 0.004 0.9997 -0.022 1.007 1.013 0.873 4.6% 4.7% 2.9% 9 60.2 0.004 0.004 1.0008 -0.023 1.009 1.014 0.874 4.6% 4.7% 2.8%

10 59.4 0.004 0.005 1.0005 -0.022 1.008 1.013 0.873 4.6% 4.7% 2.9% 11 58.4 0.004 0.003 0.9999 -0.024 1.007 1.012 0.873 4.6% 4.6% 2.8% 12 57.4 0.004 0.005 0.9991 -0.022 1.007 1.012 0.872 4.6% 4.6% 2.8% 13 56.3 0.004 0.003 0.9995 -0.024 1.007 1.012 0.872 4.5% 4.6% 2.8% 14 55.2 0.003 0.003 0.9990 -0.024 1.007 1.011 0.872 4.5% 4.6% 2.8% 15 53.9 0.003 0.003 0.9989 -0.025 1.006 1.011 0.871 4.5% 4.6% 2.8% 16 52.7 0.003 0.002 0.9994 -0.026 1.007 1.011 0.872 4.5% 4.6% 2.8% 17 51.3 0.003 0.004 0.9988 -0.025 1.006 1.010 0.871 4.5% 4.6% 2.8% 18 50.0 0.002 0.002 0.9986 -0.027 1.006 1.010 0.871 4.4% 4.5% 2.8% 19 48.5 0.002 0.001 1.0003 -0.028 1.008 1.012 0.872 4.5% 4.5% 2.8% 20 47.1 0.002 0.003 0.9997 -0.027 1.007 1.011 0.871 4.5% 4.6% 2.8% 21 45.6 0.001 0.001 1.0005 -0.030 1.008 1.012 0.872 4.5% 4.5% 2.8% 22 44.1 0.001 0.001 1.0000 -0.030 1.007 1.011 0.871 4.5% 4.5% 2.8% 23 42.6 0.001 0.001 1.0006 -0.030 1.008 1.011 0.872 4.4% 4.5% 2.8% 24 41.0 0.000 0.000 1.0019 -0.032 1.009 1.013 0.873 4.4% 4.5% 2.8% 25 39.5 0.000 0.000 1.0006 -0.033 1.008 1.011 0.871 4.4% 4.5% 2.7% 26 37.9 0.000 0.000 0.9998 -0.034 1.007 1.010 0.871 4.3% 4.4% 2.7% 27 36.4 0.000 0.000 1.0009 -0.035 1.008 1.011 0.871 4.3% 4.4% 2.7% 28 34.9 -0.001 0.000 1.0003 -0.035 1.008 1.011 0.871 4.3% 4.4% 2.7% 29 33.3 -0.001 -0.003 1.0005 -0.039 1.008 1.011 0.871 4.2% 4.3% 2.7% 30 31.8 -0.001 -0.001 1.0001 -0.038 1.008 1.010 0.870 4.3% 4.3% 2.7% 31 30.3 -0.002 -0.001 1.0002 -0.038 1.008 1.010 0.870 4.2% 4.2% 2.7%

32 28.9 -0.002 -0.001 1.0023 -0.040 1.010 1.013 0.872 4.2% 4.2% 2.7% 33 27.4 -0.002 -0.003 1.0006 -0.043 1.009 1.011 0.870 4.2% 4.2% 2.6% 34 26.0 -0.003 -0.003 0.9994 -0.044 1.008 1.010 0.869 4.1% 4.1% 2.6% 35 24.6 -0.003 -0.003 0.9998 -0.045 1.008 1.010 0.869 4.1% 4.1% 2.6% 36 23.3 -0.003 -0.002 1.0011 -0.045 1.010 1.012 0.870 4.1% 4.1% 2.6% 37 22.0 -0.003 -0.005 1.0001 -0.049 1.009 1.011 0.869 4.1% 4.1% 2.6% 38 20.7 -0.004 -0.003 0.9995 -0.049 1.009 1.011 0.869 4.1% 4.1% 2.5% 39 19.5 -0.004 -0.004 1.0000 -0.051 1.010 1.011 0.869 4.1% 4.1% 2.5% 40 18.3 -0.004 -0.004 1.0005 -0.052 1.011 1.012 0.869 4.1% 4.1% 2.5% 41 17.2 -0.004 -0.006 1.0005 -0.056 1.011 1.013 0.869 4.1% 4.1% 2.5% 42 16.1 -0.005 -0.004 1.0002 -0.056 1.012 1.013 0.869 4.0% 4.0% 2.5% 43 15.0 -0.005 -0.004 0.9983 -0.057 1.010 1.011 0.867 3.9% 3.9% 2.5% 44 14.0 -0.005 -0.006 1.0001 -0.062 1.013 1.014 0.869 3.9% 3.9% 2.5% 45 13.1 -0.005 -0.005 1.0022 -0.063 1.015 1.016 0.870 3.9% 3.9% 2.5% 46 12.2 -0.005 -0.005 1.0021 -0.065 1.016 1.017 0.870 3.9% 3.9% 2.5% 47 11.3 -0.005 -0.005 1.0020 -0.068 1.017 1.018 0.870 3.8% 3.8% 2.5% 48 10.5 -0.006 -0.006 0.9999 -0.071 1.016 1.017 0.868 3.7% 3.7% 2.4% 49 9.7 -0.006 -0.006 0.9992 -0.074 1.017 1.017 0.868 3.6% 3.6% 2.4% 50 9.0 -0.006 -0.005 0.9994 -0.076 1.018 1.019 0.868 3.6% 3.6% 2.4%

 

Table  2  Power  of  Standardized  Difference  Tests  

For  Various  Sample  Sizes  and  Rates  of  Earnings  Management  With  Earnings  Management  Concentrated  Entirely  in  Intervals  –1  and  +1  

 Panel A : Power of Left Standardized Differences

Sample Size

  1,000 2,000 4,000 8,000 16,000 32,000 64,000 128,000

Rate 0.025 0.061 0.066 0.074 0.085 0.105 0.137 0.193 0.294

0.050 0.074 0.086 0.105 0.138 0.195 0.296 0.470 0.717 0.075 0.089 0.110 0.146 0.210 0.323 0.512 0.765 0.956 0.100 0.106 0.139 0.198 0.301 0.478 0.727 0.938 0.998 0.200 0.204 0.312 0.495 0.747 0.948 0.999 1.000 1.000 0.300 0.351 0.555 0.810 0.973 1.000 1.000 1.000 1.000 0.400 0.536 0.791 0.966 0.999 1.000 1.000 1.000 1.000 0.500 0.725 0.936 0.998 1.000 1.000 1.000 1.000 1.000 0.600 0.874 0.989 1.000 1.000 1.000 1.000 1.000 1.000 0.700 0.959 0.999 1.000 1.000 1.000 1.000 1.000 1.000 0.800 0.992 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.900 0.999 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Panel B : Power of Right Standardized Differences 0.025 0.061 0.066 0.073 0.085 0.104 0.136 0.191 0.289

0.050 0.073 0.085 0.103 0.135 0.189 0.287 0.454 0.698 0.075 0.087 0.107 0.141 0.201 0.307 0.487 0.738 0.943 0.100 0.103 0.133 0.187 0.282 0.447 0.689 0.916 0.996 0.200 0.182 0.274 0.434 0.672 0.905 0.994 1.000 1.000 0.300 0.287 0.456 0.700 0.922 0.996 1.000 1.000 1.000 0.400 0.409 0.640 0.882 0.991 1.000 1.000 1.000 1.000 0.500 0.536 0.791 0.966 0.999 1.000 1.000 1.000 1.000 0.600 0.654 0.893 0.993 1.000 1.000 1.000 1.000 1.000 0.700 0.756 0.952 0.999 1.000 1.000 1.000 1.000 1.000 0.800 0.837 0.981 1.000 1.000 1.000 1.000 1.000 1.000 0.900 0.897 0.993 1.000 1.000 1.000 1.000 1.000 1.000

 Power  represents  the  probability  (rounded  to  three  decimal  places  of  accuracy)  of  significant  left-­‐  or  right-­‐standardized  differences  for  tests  of  level  α=.05.        

 Table  3  

Power  and  Relative  Power  of  Standardized  Difference  Tests  For  Various  Sample  Sizes  and  Rates  of  Earnings  Management  

With  Interval  Widths  One-­‐Half  of  Concentration  Width    Panel A : Power of Left Standardized Differences

Sample Size

  1,000 2,000 4,000 8,000 16,000 32,000 64,000 128,000

Rate 0.025 0.055 0.057 0.060 0.065 0.072 0.083 0.101 0.130

0.050 0.060 0.065 0.072 0.083 0.101 0.131 0.184 0.276 0.075 0.066 0.074 0.086 0.106 0.139 0.198 0.301 0.478 0.100 0.073 0.084 0.103 0.133 0.187 0.282 0.447 0.689 0.200 0.105 0.138 0.195 0.296 0.470 0.717 0.932 0.997 0.300 0.151 0.218 0.337 0.535 0.790 0.966 0.999 1.000 0.400 0.214 0.331 0.524 0.779 0.961 0.999 1.000 1.000 0.500 0.299 0.475 0.723 0.936 0.998 1.000 1.000 1.000 0.600 0.409 0.640 0.882 0.991 1.000 1.000 1.000 1.000 0.700 0.543 0.798 0.969 1.000 1.000 1.000 1.000 1.000 0.800 0.691 0.917 0.996 1.000 1.000 1.000 1.000 1.000 0.900 0.832 0.979 1.000 1.000 1.000 1.000 1.000 1.000

Panel B : Power of Right Standardized Differences

0.025 0.055 0.057 0.060 0.065 0.072 0.082 0.100 0.129 0.050 0.060 0.065 0.071 0.082 0.099 0.128 0.177 0.265 0.075 0.065 0.073 0.084 0.103 0.133 0.187 0.282 0.447 0.100 0.071 0.082 0.098 0.126 0.174 0.260 0.409 0.639 0.200 0.096 0.123 0.169 0.250 0.393 0.616 0.864 0.987 0.300 0.126 0.174 0.260 0.409 0.639 0.882 0.991 1.000 0.400 0.160 0.234 0.364 0.576 0.829 0.979 1.000 1.000 0.500 0.197 0.299 0.475 0.723 0.936 0.998 1.000 1.000 0.600 0.236 0.369 0.582 0.835 0.980 1.000 1.000 1.000 0.700 0.278 0.440 0.680 0.910 0.995 1.000 1.000 1.000 0.800 0.321 0.510 0.763 0.955 0.999 1.000 1.000 1.000 0.900 0.365 0.577 0.830 0.979 1.000 1.000 1.000 1.000

 Power  represents  the  probability  (rounded  to  three  decimal  places  of  accuracy)  of  significant  left-­‐  or  right-­‐standardized  differences  for  tests  of  level  α=.05.        

Table  3  (continued)    Panel C : Relative Power of Left Standardized Differences

Sample Size

1,000 2,000 4,000 8,000 16,000 32,000 64,000 128,000

Rate 0.025 90% 87% 82% 76% 69% 61% 52% 44%

0.050 82% 76% 69% 61% 52% 44% 39% 39% 0.075 75% 67% 59% 51% 43% 39% 39% 50% 0.100 69% 60% 52% 44% 39% 39% 48% 69% 0.200 52% 44% 39% 40% 50% 72% 93% 100% 0.300 43% 39% 42% 55% 79% 97% 100% 100% 0.400 40% 42% 54% 78% 96% 100% 100% 100% 0.500 41% 51% 72% 94% 100% 100% 100% 100% 0.600 47% 65% 88% 99% 100% 100% 100% 100% 0.700 57% 80% 97% 100% 100% 100% 100% 100% 0.800 70% 92% 100% 100% 100% 100% 100% 100% 0.900 83% 98% 100% 100% 100% 100% 100% 100%

Panel D : Relative Power of Right Standardized Differences

0.025 90% 87% 82% 76% 69% 61% 52% 45% 0.050 82% 76% 69% 61% 52% 45% 39% 38% 0.075 75% 68% 60% 51% 43% 38% 38% 47% 0.100 69% 61% 53% 45% 39% 38% 45% 64% 0.200 53% 45% 39% 37% 43% 62% 86% 99% 0.300 44% 38% 37% 44% 64% 88% 99% 100% 0.400 39% 37% 41% 58% 83% 98% 100% 100% 0.500 37% 38% 49% 72% 94% 100% 100% 100% 0.600 36% 41% 59% 84% 98% 100% 100% 100% 0.700 37% 46% 68% 91% 100% 100% 100% 100% 0.800 38% 52% 76% 95% 100% 100% 100% 100% 0.900 41% 58% 83% 98% 100% 100% 100% 100%

     

Table  4  Power  and  Relative  Power  of  Standardized  Difference  Tests  For  Various  Sample  Sizes  and  Rates  of  Earnings  Management  

With  Interval  Widths  Twice  Concentration  Width    Panel A : Power of Left Standardized Differences

Sample Size

  1,000 2,000 4,000 8,000 16,000 32,000 64,000 128,000

Rate 0.025 0.057 0.061 0.066 0.073 0.085 0.104 0.137 0.193

0.050 0.066 0.074 0.085 0.105 0.137 0.193 0.294 0.466 0.075 0.075 0.088 0.109 0.145 0.207 0.319 0.506 0.758 0.100 0.086 0.105 0.138 0.195 0.296 0.470 0.717 0.932 0.200 0.139 0.198 0.301 0.478 0.727 0.938 0.998 1.000 0.300 0.215 0.331 0.526 0.780 0.962 0.999 1.000 1.000 0.400 0.312 0.496 0.747 0.948 0.999 1.000 1.000 1.000 0.500 0.429 0.666 0.901 0.994 1.000 1.000 1.000 1.000 0.600 0.556 0.810 0.973 1.000 1.000 1.000 1.000 1.000 0.700 0.681 0.911 0.995 1.000 1.000 1.000 1.000 1.000 0.800 0.792 0.966 0.999 1.000 1.000 1.000 1.000 1.000 0.900 0.878 0.990 1.000 1.000 1.000 1.000 1.000 1.000

Panel B : Power of Right Standardized Differences 0.025 0.057 0.061 0.066 0.073 0.085 0.104 0.136 0.191

0.050 0.066 0.073 0.085 0.104 0.136 0.191 0.289 0.458 0.075 0.075 0.087 0.108 0.142 0.203 0.311 0.493 0.745 0.100 0.085 0.103 0.135 0.190 0.287 0.454 0.698 0.922 0.200 0.133 0.187 0.282 0.447 0.689 0.916 0.996 1.000 0.300 0.197 0.300 0.476 0.725 0.936 0.998 1.000 1.000 0.400 0.274 0.434 0.672 0.905 0.994 1.000 1.000 1.000 0.500 0.362 0.572 0.826 0.978 1.000 1.000 1.000 1.000 0.600 0.456 0.700 0.923 0.996 1.000 1.000 1.000 1.000 0.700 0.550 0.805 0.971 1.000 1.000 1.000 1.000 1.000 0.800 0.640 0.883 0.991 1.000 1.000 1.000 1.000 1.000 0.900 0.722 0.935 0.998 1.000 1.000 1.000 1.000 1.000

 Power  represents  the  probability  (rounded  to  three  decimal  places  of  accuracy)  of  significant  left-­‐  or  right-­‐standardized  differences  for  tests  of  level  α=.05.        

Table  4  (continued)    Panel C : Relative Power of Left Standardized Differences

Sample Size

1,000 2,000 4,000 8,000 16,000 32,000 64,000 128,000

Rate 0.025 94% 92% 90% 86% 81% 76% 71% 66%

0.050 89% 86% 81% 76% 70% 65% 63% 65% 0.075 85% 80% 75% 69% 64% 62% 66% 79% 0.100 81% 75% 70% 65% 62% 65% 76% 93% 0.200 68% 63% 61% 64% 77% 94% 100% 100% 0.300 61% 60% 65% 80% 96% 100% 100% 100% 0.400 58% 63% 77% 95% 100% 100% 100% 100% 0.500 59% 71% 90% 99% 100% 100% 100% 100% 0.600 64% 82% 97% 100% 100% 100% 100% 100% 0.700 71% 91% 100% 100% 100% 100% 100% 100% 0.800 80% 97% 100% 100% 100% 100% 100% 100% 0.900 88% 99% 100% 100% 100% 100% 100% 100%

Panel D : Relative Power of Right Standardized Differences 0.025 95% 93% 90% 86% 82% 77% 71% 66%

0.050 90% 86% 82% 77% 72% 67% 64% 66% 0.075 86% 82% 76% 71% 66% 64% 67% 79% 0.100 83% 78% 72% 67% 64% 66% 76% 93% 0.200 73% 68% 65% 67% 76% 92% 100% 100% 0.300 69% 66% 68% 79% 94% 100% 100% 100% 0.400 67% 68% 76% 91% 99% 100% 100% 100% 0.500 68% 72% 86% 98% 100% 100% 100% 100% 0.600 70% 78% 93% 100% 100% 100% 100% 100% 0.700 73% 85% 97% 100% 100% 100% 100% 100% 0.800 76% 90% 99% 100% 100% 100% 100% 100% 0.900 80% 94% 100% 100% 100% 100% 100% 100%