Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his...

11
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/327155004 Elementary Statistics on Trial (the case of Lucia de Berk) Preprint · August 2018 CITATIONS 0 READS 100 3 authors: Some of the authors of this publication are also working on these related projects: Forensic statistics View project Brownian motion with a parabolic drift View project Richard David Gill Leiden University 198 PUBLICATIONS 11,704 CITATIONS SEE PROFILE Piet Groeneboom Delft University of Technology 116 PUBLICATIONS 2,509 CITATIONS SEE PROFILE Peter de Jong W.S.O. 9 PUBLICATIONS 367 CITATIONS SEE PROFILE All content following this page was uploaded by Piet Groeneboom on 22 August 2018. The user has requested enhancement of the downloaded file.

Transcript of Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his...

Page 1: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/327155004

Elementary Statistics on Trial (the case of Lucia de Berk)

Preprint · August 2018

CITATIONS

0READS

100

3 authors:

Some of the authors of this publication are also working on these related projects:

Forensic statistics View project

Brownian motion with a parabolic drift View project

Richard David Gill

Leiden University

198 PUBLICATIONS   11,704 CITATIONS   

SEE PROFILE

Piet Groeneboom

Delft University of Technology

116 PUBLICATIONS   2,509 CITATIONS   

SEE PROFILE

Peter de Jong

W.S.O.

9 PUBLICATIONS   367 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Piet Groeneboom on 22 August 2018.

The user has requested enhancement of the downloaded file.

Page 2: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

Elementary Statistics on Trial

(the case of Lucia de Berk)

Richard D. Gill, Piet Groeneboom and Peter de Jong

The trial of the Dutch nurseLucia de Berk, suspected of sev-eral murders and attempts ofmurder was a very high profilecase in the Netherlands. The ini-tial suspicion rested mainly onquasi-statistical considerations,which produced (based partly onincorrect calculations) extremelysmall probabilities. Since the out-comes proved controversial, thecourt claimed to have droppedthe statistical calculations fromthe verdict. But the verdict stillrested on intuitive notions as“very improbable”. So statisticswere at center stage.

In the conviction of Luciade Berk an important role was

played by a simple (so-called) hy-pergeometric model, used by thelaw psychologist (H. Elffers) con-sulted as statistician by the court,which produced very small prob-abilities of occurrences of certainnumbers of incidents.

In this article we want to drawattention to the fact that, if wetake into account the variationamong nurses in incidents theyexperience during their shifts,these probabilities become con-siderably larger. This points tothe danger of using an oversim-plified discrete probability modelin these circumstances.

The outcomes of applying ouralternative model to this case are

in striking contrast with thoseof the first calculations whichled to the initial suspicions andwere instrumental in determin-ing the atmosphere surroundingthe trial and subsequent hyste-ria. The main result is that un-der the assumption of heterogene-ity, the probability of experienc-ing a number of incidents (14)that led to Lucia’s conviction isabout 0.021864 or one in 46 ifthe calculations are based on thesame data as used by the law psy-chologist of the court. In his cal-culation, however, this probabil-ity was equal to one in 342 mil-lion.

Figure 1: Lucia de Berk before her imprisonment

1

Page 3: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

The data

We use data from the unpub-lished reports of Elffers [1] and[2]. But before going into this,we want to make some general re-marks on the data collection.

One of the key features of thedata was the flawed data collec-tion. Here different disciplinescame into conflict: criminal inves-tigation and scientific data gath-ering are very different. Their ob-jectives, methods and results arenot compatible. Criminal inves-tigation is started when there is

(suspicion of) a crime, hence oneis looking for or hunting down asuspect. If there is need for mean-ingful statistics another approachis needed, guaranteeing clear defi-nitions and uniformity of the datacollection. In the case of Lu-cia de Berk this clash of culturesproved disastrous. Incidents out-side shifts of Lucia were discardedand some initially reported inci-dents were later relabeled withoutclear reasons. Extra shifts with-out incidents and incidents out-side shifts of Lucia were subse-quently brought to light. More-

over, the data collection rested fora large part on memory.

Clearly, the context of a crim-inal investigation produces a spe-cific mindset: on the one handthe witnesses know what is lookedfor (and some of them may al-ready be convinced of the guilt ofthe suspect), on the other handfear of implicating one’s self andfriends can considerably distortmemory. The data on shifts andincidents for the period which wassingled out in Elffers’ reports aregiven in the following table (seealso [6]).

Table 1: Data on shifts and incidents

Hospital name (and ward number) JCH RCH-41 RCH-42 TotalTotal number of shifts 1029 336 339 1704Lucia’s number of shifts 142 1 58 201Total number of incidents 8 5 14 27Number of incidents during Lucia’s shifts 8 1 5 14

JCH and RCH denote the “Juliana Children’s Hospital” and “Red Cross Hospital”, respectively, and41 and 42 were different ward numbers of the Red Cross hospital.

Elffers’ method

We first discuss the analysis ofthe law psychologist H. Elffers,the statistician consulted by thecourt. This analysis was basedon Table 1. As was noticed later,Lucia de Berk had actually donethree shifts in RCH-41 instead ofjust one, but we will argue fromthe data used by H. Elffers. Asexplained in [6], Elffers arguedby conditioning on part of thedata and used two fundamentalassumptions:

1. There is a fixed probabilityp for the occurrence of anincident during a shift (forexample, p does not dependon whether the shift is a dayshift or a night shift or onthe nurse involved, etc.),

2. Incidents occur indepen-dently of one another.

On the basis of these assump-tions, one can compute the prob-ability that L incidents occur dur-ing Lucia’s shifts, given the totalnumber I of incidents and the to-tal number N of shifts considered

in the period of study. This isa hypergeometric probability givenby (

SL

)(N−SI−L

)(NI

) (1)

where S is the number of shifts ofLucia and I is the total number ofincidents, and where

(SL

), etc. de-

note binomial coefficients. If wejust take all the data of Table 1together, we have a total num-ber of N = 1704 shifts, Lucia hadS = 201 shifts, there was a totalnumber I = 27 of incidents, andL = 14 incidents during shifts of

2

Page 4: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

Lucia. If we evaluate (1) withthese values for N,S, I and L, we

get the very small probability ofabout one in 3.4 million.

Figure 2: A Fokke & Sukke cartoon from 10-30-2007 in the Dutch newspaper NRC Next. The text waskindly translated into English for us by the creators of the cartoon: Reid, Geleijnse and Van Tol. Luciade Berk was still in prison at that time. The two ducks are defending a family guardian, accused ofbeing responsible for the death of the girl Savanna, who died by suffocation. The accused woman wasin fact acquitted (by another defense!). What counselor Sukke is saying corresponds to what the lawpsychologist H. Elffers told the court: “Honored court, this is no coincidence. The rest is up to you.”.

If we want to compute theprobability (p-value) that a nurseis present with 14 or more inci-dents in Elffers’ method of test-ing a null hypothesis of no sys-tematic effects on these combineddata (but he actually did nottest it in this way on the com-bined data, see below), we haveto sum the probabilities for L =14, 15, . . . , 27, and then we get theprobability of about one in 3 mil-lion. For the model we introducein the next section, however, we

get, using the same data, a prob-ability of one in 46.

However, Elffers proceededsomewhat differently, not com-bining the data of the differenthospitals. The details of whathe actually did are described in[6]. The most important mistakehe made in his calculation wasto take the three hospitals sepa-rately, and multiplying the prob-abilities he got for these sepa-rately. This has the absurd con-sequence that a nurse working in

several different hospitals gets ahigher chance of being accused ofinexplicably being present at in-cidences than a nurse working injust one hospital. In this way hearrived at his estimate that theprobability that Lucia de Berkwas present at the given numbersof incidents at the Juliana Chil-dren’s hospital and the Red Crosshospital was equal to one in 342million. We refer the interestedreader to [6] and to Chapter 7“Math error number 7: the in-

3

Page 5: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

credible coincidence” of the book[7].

Post-hoc testing

A reviewer of this paper has askedus to comment on the issue of thedanger of post-hoc testing: test-ing a hypothesis using the samedata which suggested that hy-pothesis. Elffers actually triedto take account of this prob-lem in the following way. Hestarted from the assumption thatthe number of incidents in thedata from JCH was much largerthan expected, and that the pur-pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurseswho worked on the ward. He

multiplied his p-value for the as-sociation with Lucia’s shifts by26, the number of nurses in thatperiod who worked on the sameward. By the time he came tolook at the data from RCH, Lu-cia was a prime suspect and hejudged that no further Bonferronitype correction was required. Fi-nally, he proposed to take a verysmall probability for the signifi-cance level of his test.

In fact, his starting assump-tion was false: in the previousyear there had been no incidentsin the ward, but the year beforethat, an even larger number. Thehospital director had not revealedthe information from two yearsago to the investigators since the

ward previously had had a differ-ent name (he had changed it).

One could try to use aBayesian approach to deal withthe post-hoc problem. Therewould be good arguments for arather low prior probability ofan arbitrary nurse being a serialkiller. The difficult task for theBayesian would be determining areasonable model for number ofincidents if Lucia is a murderer,since one has to take into accountthat some proportion of the in-cidents are not murders at all.Heterogeneity would also remainan issue for a Bayesian analysis.Explaining the methodology in acourt of law could well be thebiggest barrier.

Alternative model

We can model the incidents thata nurse experiences by a so-calledPoisson process, with a nurse-dependent intensity A, where weuse A for “accident proneness”. APoisson process is used to modelincoming phone calls during non-busy hours, fires in a big city,etc. Since we believe the inci-dents to be rare, a Poisson pro-cess is an obvious choice for mod-eling the incidents that a nurseexperiences.

This approach models twoseparate phenomena. Firstly, theintensity of nurses seeing or re-porting incidents is modeled byintroducing the random variableA. We assume that A has an ex-ponential distribution, but otherchoices are also possible.

Note that we move away here

from a simple discrete model,as used by Elffers, but use in-stead a continuous distributionfor the “accident proneness” Aof the nurse. Statistical mod-els with continuously varying ran-dom variables are perhaps moredifficult to explain to the judges,but are often much more realistic,which should be the only impor-tant consideration here.

Secondly, the number of in-cidents happening to a nurse onduty depends on A and the timeinterval she is working, and fol-lows (conditionally on A) a Pois-son distribution. The time inter-val is measured by the number ofshifts the nurse has had.

Assuming that A is exponen-tially distributed implies, amongother things, that it can easilyhappen that one nurse has twicethe incident rate of another nurse.

The probability of this event is2/3; in fact the probability of aincidence rate of a factor k timesthat of another nurse is 2/(k+1).

The statistical problem boilsdown to the estimation of the pa-rameter, characterizing the mix-ture of Poisson processes for thedifferent nurses. Combining theJuliana Children’s Hospital andthe two wards of the Red CrossHospital, Lucia had 201 shifts and14 incidents.

A major flaw in the investi-gation is that the data collectionis irreproducible and lacks rigor-ous methods and definitions. Itcrucially depended on the mem-ory of people who knew what wassought after. But we will arguefrom the data in Table 1 above,which also was used in the com-putations of Elffers.

We’ll take the overall proba-

4

Page 6: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

bility of an incident per shift tobe the ratio of total number of in-cidents to total number of shifts,µ = 27/1704. If we take a shiftto be our unit time interval, thenthis would be a so-called momentestimate of the mean intensity ofincidents.

This means, that, condition-ally on the time interval T =201, the number of incidents fol-lows a mixture of Poisson randomvariables with parameter 201A,where the intensity A has an ex-ponential distribution with firstmoment µ. Thus on average, aninnocent Lucia would experience201 ·µ = 201 ·27/1704 ≈ 3.18486incidents. A picture of the prob-abilities that the number of inci-dents is bigger than k = 1, 2, . . .is shown in Figure 3, which isbased on the calculations given atthe end of this section.

Heterogeneity of any kind in-

creases the variation in the num-ber of incidents experienced bya randomly chosen nurse over agiven period of time (given num-ber of shifts). From the well-known relations for conditionalexpectation (E) and variances(var)

E(X) = E(E(X|Y )),

var(X) = E(var(X|Y ))

+ var(E(X|Y )),

it follows that whereas for a Pois-son distributed random variablevariance and mean are equal, for amixture of Poisson’s (with differ-ent conditional means), the vari-ance is larger than the mean. Soif some nurses experience more orless incidents than other, in allcases the end-result is overdisper-sion caused by heterogeneity.

Applied to the current modelwhich is geometric with parame-

ter (1 + tµ)−1 (see the computa-tion at the end of this section):

var(N) = (1 + tµ)2 − (1 + tµ)

= tµ+ (tµ)2,

where the latter term neatly splitsover the expected variance of thePoisson process plus the varianceof the conditional parameter ofthe Poisson process which we as-sumed to be exponential.

The fact that a modestamount of heterogeneity turns analmost impossible occurrence intosomething merely mildly unusual,is strong support for further em-pirical research whether and ifso in what forms heterogeneityplays a role in healthcare. Itcan have major implications indifferent areas, such as medicalresearch (representing an extrasource of variation) and trainingof medical staff.

Computation of the probabilities in the mixed Poisson model

If N is a Poisson random variable with parameter λ, the probability that the number of incidents isbigger than k, k = 1, 2, . . . , is given by an integral, namely

1

(k − 1)!

∫ λ

0

e−xxk−1 dx, (2)

see, e.g., [3], Exercise 46, p. 173. This means that if we assume that the “accident proneness” of thenurses has an exponential distribution with expectation µ (in our case estimated by 27/1704) and theparameter of the Poisson distribution for the nurse is given by ta, where t is the time interval (in ourcase t = 201) and a the accident proneness, we have to integrate (2) with respect to the density of theexponential distribution with expectation µ, taking λ = ta So we get for the probability that a nurseexperiences more than k incidents:∫ ∞

0

P {I ≥ k|A = a, T = t} e−a/µ

µda =

∫ ∞0

{1

(k − 1)!

∫ ta

0

e−xxk−1 dx

}e−a/µ

µda

=

(tµ

1 + tµ

)k.

5

Page 7: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

1 2 3 4 5 6 7 8 9 10 11 12 13 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 3: Probabilities (in the Poisson model) that the number of incidents in 201 shifts for one nurseis at least 1,2,3,. . . , if µ = 27/1704. The probabilities are given by the heights of the columns above1, 2, 3, . . . , respectively.

This is the geometric distribu-tion with parameter 1/(1 + tµ).With k = 14 and tµ = 3.18486this yields 0.0218641 or about onein 46.

An early version of this paperused a revision of Elffers’ data-set proposed by Professor TonDerksen, philosopher of science,who together with his sister, med-ical doctor Metta de Noo, wasthe first to actively contest thecourt’s reasoning in the case ofLucia de Berk.

Our model then led us to aright tail probability of one innine. We later noticed that Derk-sen had also removed all incidentswhich the court finally decidednot to count as provenly causedby Lucia; he used the legal argu-ment that Elffers had previouslybeen instructed by the judges to

do the same for the data fromthe Juliana Children’s Hospital.This does not make any statisti-cal sense.

Going back to original medi-cal records, Derksen and de Nooalso found inconsistencies in theclassification and timing of sev-eral incidents, which underlinesthe unreliability of the data. Cor-recting the data for apparent er-rors would also improve the re-sults from the defence point ofview.

We decided in the present pa-per to stick with Elffers’ num-bers in order to focus on our mainpoint concerning the impact ofheterogeneity.

Extended discussionof heterogeneity

We showed that a modest amountof heterogeneity leads to very dif-ferent orders of magnitude in theoutcomes of crucial calculations.Here we address some of the un-derlying mechanisms which maylead to the postulated hetero-geneity.

Clearly, the data in this caseshow heterogeneity. The datastem from two hospitals with verydifferent patients, young childrenin the JCH and elder adult pa-tients in the RCH. The data comefrom three wards and the rates ofincidents per shift vary consider-ably for each ward.

We describe two generalmechanisms causing heterogene-ity. The first one concerns prop-

6

Page 8: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

erties of subjects directly relatedto the intensity of the rate ofincidents. The other mechanismis more indirect and results from“spurious correlations”, in whichproperties not related to the un-derlying intensity influence themeasurement via unexpected de-pendencies and systematic varia-tions in variables assumed to beindependent and uniform.

Related to this is another as-pect of the data: the degree towhich a specific model or null-hypothesis is susceptible to smallvariations in the data. We willshow this to be the case in theoriginal calculations. Althoughour example is tuned to this veryspecific case, it refers to a muchmore general caveat. It shouldbe established how stable certainmodels are under small perturba-tions of the data.

Are nurses interchange-able?

According to medical specialistswe have spoken to, nurses arecompletely interchangeable withrespect to the occurrence of med-ical emergencies among their pa-tients. However, according tonursing staff we have consulted,this is not the case at all. Dif-ferent nurses have different stylesand different personalities, andthis can and does have a medicalimpact on the state of their pa-tients. Especially regarding careof the dying, it is folk knowledgethat terminally ill persons tend todie preferentially on the shifts ofthose nurses with whom they feel

more comfortable. As far as weknow there has been no statisticalresearch on this phenomenon.

There is another obvious wayin which the intensity of incidentsdepends on characteristics thatvary over the population. Anyevent that can turn out to be an“incident” starts with the call of adoctor. And in all cases it is thenurse who decides to call a doc-tor. This decision is influencedby professional and personal at-titude, past experience and per-sonality traits like self-confidence.It seems obvious to us that thesecharacteristics vary greatly in anypopulation. Hence we assumethat the intensity of experiencingincidents varies accordingly.

Inadequacy of the hyper-geometric distribution asa model and spuriouscorrelations

The model underlying the null-hypothesis (which led to thehypergeometric distribution) de-pends on two assumptions: Boththe incidents and the nurses areassigned to shifts uniformly andindependently of each other.

Above we have establishedtwo ways in which characteristicsof individual subjects may lead tovariation in the intensity of expe-riencing an incident. This vari-ation is in contrast with one ofthe assumptions underlying thehypergeometric distribution: uni-formity.

Next we discuss sources of cor-relation which correspond to in-direct rather than direct causa-

tion: we speak then of spurious-correlation, correlation which canbe explained by confounding fac-tors, by common causes.

There are serious reasons todoubt the uniformity of incidentsover shifts. There may occur pe-riodical differences. The popula-tion of a hospital ward may varyover the seasons. The patientsmay differ in character and sever-ity of illness due to seasonal influ-ences. There are differences be-tween day and night shifts andbetween weekend shifts and shiftson weekdays. An extensive studyof Dutch Intensive Care Units ad-missions shows a marked increasein deaths when the admission fallsoutside “office hours”[5]. Recallthat there have to be nurses onduty throughout the night andthroughout the weekends, whilethe medical specialists tend tohave “normal working hours”. Fi-nally there is the periodical cycleof the circadian rhythm, influenc-ing the condition of the patientsand the attention of the medicalstaff [4].

Notice that circadian varia-tion in e.g. mortality and the re-sulting variation of incident ratebetween different shifts over theday interacts with the variation inthe number of nurses on a shift,with more personnel on the dayshifts. This can result in a highernumber of nurses with an incidenton their shift if the incident rate ishigher during day time shifts andconversely, a lower number in theopposite case.

There may be other, non-periodical variations that affectthe uniformity of incidents. In

7

Page 9: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

the case of the Juliana Chil-dren’s hospital there has been arather sensitive matter of policy:whether very ill children, who arenot going to live for very long,should die at home or in the hos-pital wards. We understand thatthis policy did change at leastonce at the JCH in the period ofinterest. Presumably a change inpolicy concerning where the hos-pital wants children to die, willhave impact on the rate of inci-dents. Further, incidents may beclustered, since one patient cangive rise to several incidents.

On the other hand the waynurses are assigned to shifts iscertainly not uniform and ‘ran-dom’. Nurses take shifts in pat-terns, for example several nightshift on a row, alternated byrows of evening or day shifts.Nurses are assigned to shifts ac-cording to skills, qualificationand other characteristics. Maybesome nurses take relatively moreweekend shifts than others, be-cause of personal circumstances.

Although both the assignmentof nurses to shifts and the as-signment of incidents to shifts arenot uniform processes, one couldhope that there might be some‘mixing’ condition that makesthe ultimate result indistinguish-

able from the postulated indepen-dence and uniformity. Certainlyone may hope, but this magi-cal mechanism should at least bemade plausible.

Taken together, even if weconsider both the shifts of agiven nurse as a random pro-cess, and the incidents on a wardas a random process, and evenif we consider the two processesas stochastically independent ofone another, the assumption ofconstant intensities of either is aguess, not based on any evidenceor argument. There may be pat-terns in the risk of incidents andthere are certainly patterns in theshifts of nurses. These patternsmay be correlated, through theprocess by which shifts are sharedover the different nurses accord-ing to their different personal sit-uations, their different wishes forparticular kinds of shifts, theirdifferent qualifications, and thechanging situation on the ward.

How stable are the hy-pergeometric probabili-ties under small changesin the data?

Consider the data of the ward atJCH. These numbers and their

interpretation are at the root ofwhat turned out to be one of thegravest miscarriages of justice ofthe Dutch Juridical system. Un-der the assumption of the hyper-geometric distribution the proba-bility of this configuration is verysmall, less than one in nine mil-lion. The configuration is in somerespects extreme: eight out ofeight incidents occur in the shiftof one nurse. However the dataare in another respect also con-spicuous: no incidents occur inthe 887 shifts where this nursewas not present. The data collec-tion had been far from flawless,with no formal definition of inci-dent, no or incomplete documen-tation, and rested at least in parton recollection of witnesses whowere aware of which facts werelooked for.

Assuming the possibility oftiny flaws in the process of dataacquisition, it is legitimate to in-vestigate the effect of 1, 2, . . . , 8incidents that could have beenforgotten, or overlooked. Thisamounts to allowing a maximalerror of less than one percent.The results are quite remarkable;in table 2 we give the probabili-ties..

Table 2: The effect of perturbations on the probabilities

Shifts with incidentsoutside Lucia’s (postulated) 0 1 2 3 4Probability 1/9043864 1/1137586 1/257538 1/79497 1/29989

Shifts (continued) 5 6 7 8Probability 1/13051 1/6329 1/3341 1/1889

8

Page 10: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

The very small numbers van-ish easily. Six or more inci-dents not remembered, not re-ported, or just defined away makethe difference between astronom-ically small on the one hand andvery unusual on the other. Thisshows that the probabilities arevery sensitive to small errors inthe data.

A judgement on data qual-ity is not only the concern ofa statistician. Judges are usedto inconsistent and incompletedata (statements), psychologistsare very well aware of the possiblefallacies of memory. Both groupshave their own professional stan-dards of how to deal with thesephenomena. A statistician, how-ever, should point out what theeffects of these phenomena can beon the outcome of his models.

If this model is used to cor-roborate evidence this sensitivityshould be made explicit, just asadverse workings of a medicineare mentioned explicitly for theusers.

Concluding remarks

In the body of this paper we haveshown the considerable effect thata modest amount of heterogene-ity can have on tail probabili-ties. The broader impact of al-lowing heterogeneity in the analy-sis of (medical) research has inter-esting consequences outside thecase of Lucia de Berk. What re-mains is a very short descriptionof how the case ended in acquit-tal. Lucia was arrested in de-cember 2001. As indicated inthe introduction, the court (of ap-peal) stated that it did not in-clude statistical considerations asbasis for its verdict. This maybe true for formal statistical con-siderations, but the essential stepin the construction of the guiltyverdict was that only one or twocases of murder had to be provenconvincingly, the rest of the mur-ders could be considered provenbased on the ”very improbable”occurrence of incidents during theshifts of Lucia. In this way sta-tistical considerations were cru-

cial, but the verdict was immu-nized against formal statistics. Inthis way Lucia was convicted in2004 for seven murders and threeattempts of murder. What fol-lowed was a long legal strugglewhere the emphasis was on thevalidity of the medical argumentsand increasingly intricate juridi-cal matters. The Lucia case wasfiercely debated in public and thestatistical notions remained herean important issue. Statisticians,now banned from the courtrooms,continued to play a role, for ex-ample by mobilizing the scientificcommunity. Gradually the no-tion emerged that a gross miscar-riage of justice had taken place.A complicating factor remainedthat, since the juridical path hadbeen followed till the end, a new“fact”, a so-called novum had tobe found. In 2008, Lucia was al-lowed to wait for the end of thelegal proceedings outside prison,and two years later she was fi-nally acquitted of all murder ac-cusations.

9

Page 11: Elementary Statistics on Trial (the case of Lucia de Berk)slud/s770/Lucia.pdf · pose of his analysis was to dis-cover whether there was an as-sociation with any of the nurses who

References

[1] H. Elffers. Distribution of incidents of resuscitation and death in the Juliana Kinderziekenhuis [JulianaChildren’s Hospital] and the Rode Kruisziekenhuis [Red Cross Hospital]. Unpublished report to theCourt, May 29 2002. URL http://www.math.leidenuniv.nl/~gill/Elffers2eng.pdf.

[2] H. Elffers. Distribution of incidents of resuscitation and death in the Juliana Kinderziekenhuis [JulianaChildren’s Hospital] and the Rode Kruisziekenhuis [Red Cross Hospital]. Unpublished report to theCourt, May 8 2002. URL http://www.math.leidenuniv.nl/~gill/Elffers1eng.pdf.

[3] William Feller. An introduction to probability theory and its applications. Vol. I. Third edition. JohnWiley & Sons, Inc., New York-London-Sydney, 1968.

[4] Kuhn G. Circadian rhythm, shift work, and emergency medicine. Ann Emerg Med, 37:88–98, 2000.doi: 10.1067/mem.2001.111571.

[5] Hans A. J. M. Kuijsten, Sylvia Brinkman, Iwan A. Meynaar, Peter E. Spronk, Johan I. van derSpoel, Rob J. Bosman, Nicolette F. de Keizer, Ameen Abu-Hanna, and Dylan W. de Lange. Hospitalmortality is associated with icu admission time. Intensive Care Med, online first, 2010. doi: 10.1007/s00134-010-1918-1.

[6] R. Meester, M. Collins, R.D. Gill, and M. van Lambalgen. On the (ab)use of statistics in the legalcase against the nurse Lucia de B. Probability and Risk, 5:233–250, 2007. With discussion by DavidLucy.

[7] Leila Schneps and Coralie Colmez. Math on trial. Basic Books, New York, 2013. ISBN 978-0-465-03292-1. How numbers get used and abused in the courtroom.

10

View publication statsView publication stats