The fundamental problem of Forensic Statistics

25
The fundamental problem of Forensic Statistics How to assess the evidential value of a rare type match Giulia Cereda, Université de Lausanne Richard D. Gill, University of Leiden

Transcript of The fundamental problem of Forensic Statistics

Page 1: The fundamental problem of Forensic Statistics

The fundamental problem of Forensic Statistics

How to assess the evidential value

of a rare type match

Giulia Cereda, Université de Lausanne

Richard D. Gill, University of Leiden

Page 2: The fundamental problem of Forensic Statistics

The problem

• A crime• A piece of evidence found at the crime scene

(DNA, fingerprint, footprint, hand writing, etc.) • A suspect (identified independently)• A match between suspect’s characteristic and

evidence’s characteristic.• A database which counts the frequency of each

characteristic.• Database frequency of the crime (and the

suspect) characteristic is 0

Page 3: The fundamental problem of Forensic Statistics

Example

• A DNA stain is found on the victim’s body.

• Y-STR profile of type h.

• A suspect is identified, which is also of Y-STR type h.

• The Y-STR database of reference does not contain type h

Small databases

Page 4: The fundamental problem of Forensic Statistics

Generalized-Good. Non parametric Good-type estimator based on Good (1953).

DiscLap-method (Andersen et al. 2013)

Explore other methods (Brenner 2010, Roewer2000, …)

How to evaluate this kind of evidence?

Page 5: The fundamental problem of Forensic Statistics

The Likelihood Ratio

E is the evidence to be evaluated

B is the background information

Hp: the suspect left the stain

Hd: someone else left the stain

Many possiblechoices

THE likelihood ratio does not exists

Page 6: The fundamental problem of Forensic Statistics

Typical choice

• E= the particular haplotype of the suspect and of the crime stain

• B=the list of haplotypes in the database

e.g. Discrete Laplace Method

Page 7: The fundamental problem of Forensic Statistics

This frequency is not known. It can only be estimated

Un

cert

ain

ty

e.g.

Dis

cLap

met

ho

d

Page 8: The fundamental problem of Forensic Statistics

A different choice

• E=number of times the haplotypes of the suspect (hs) and the haplotype of the crime-stain (hc) are in the data-base and whether or not they are the same haplotype.

• B= the frequencies of the frequencies of the database.

Ignore information about the particular haplotype

Page 9: The fundamental problem of Forensic Statistics

• D database

Gotham City, 12,13,30,24,10,11,13

Gotham City, 12,13,30,24,10,11,14

Gotham City, 13,12,30,24,10,11,13

Gotham City, 13,13,29,23,10,11,13

Gotham City, 13,13,29,24,10,11,14

Gotham City, 13,13,29,24,11,13,13

Gotham City, 13,13,29,24,11,13,13

Gotham City, 13,13,30,24,10,11,13

Gotham City, 13,13,30,24,10,11,13

Gotham City, 13,13,30,24,10,11,13

Gotham City, 13,13,30,24,10,11,13

D’ database count

Gotham City, 12,13,30,24,10,11,131 Gotham City, 12,13,30,24,10,11,141Gotham City, 13,12,30,24,10,11,131Gotham City, 13,13,29,23,10,11,131Gotham City, 13,13,29,24,10,11,141Gotham City, 13,13,29,24,11,13,132Gotham City, 13,13,30,24,10,11,134

The frequencies of frequencies

N1 5

N2 1

N3 0

N4 1

Df frequencies of frequencies

Information is discarded

N1 is the number of haplotypes which occur once in D (singletons)

N2 is the number of dupletsEtc.

Page 10: The fundamental problem of Forensic Statistics

A database D of size N

Gotham City, 12,13,30,24,10,11,13

Gotham City, 12,13,30,24,10,11,14

Gotham City, 13,12,30,24,10,11,13

Gotham City, 13,13,29,23,10,11,13

Gotham City, 13,13,29,24,10,11,14

Gotham City, 13,13,29,24,11,13,13

Gotham City, 13,13,29,24,11,13,13

Gotham City, 13,13,30,24,10,11,13

Gotham City, 13,13,30,24,10,11,13

Gotham City, 13,13,30,24,10,11,13

Gotham City, 13,13,30,24,10,11,13

can be considered as an i.i.d. sample (Y1, Y2, …, YN ) from species {1,2,…,s} with

probabilities (p1, p2, … ps).

The database count

Gotham City, 12,13,30,24,10,11,13 1

Gotham City, 12,13,30,24,10,11,14 1

Gotham City, 13,12,30,24,10,11,13

1

Gotham City, 13,13,29,23,10,11,13 1

Gotham City, 13,13,29,24,10,11,14 1

Gotham City, 13,13,29,24,11,13,13 2

Gotham City, 13,13,30,24,10,11,13 4

is a realization of r.v. (X1, X2, …, Xs),

defined Xj=#{i|Yi=j}.

The frequencies of frequencies

is made of (N1, N2,… )where Nj=#{i|Xi=j}

N1 5

N2 1

N3 0

N4 1

Page 11: The fundamental problem of Forensic Statistics

• E=numbers of times the haplotypes of the suspect (hs) and the haplotype of the crime-stain (hc) are in the data-base and whether or not they are the same haplotype.

• B= the frequencies of the frequencies of the database (Df)

Page 12: The fundamental problem of Forensic Statistics
Page 13: The fundamental problem of Forensic Statistics

unbiased estimator for the numerator

unbiased estimator for the denominator

It is more sensible to estimate instead of .

is approximately unbiased for .

This suggests to use

as an estimator for

Page 14: The fundamental problem of Forensic Statistics

How well estimates the true (unknown) ?

Take a big database of size 12,727.

Consider it as the world population. C1=0, C2=0.

Then,

1. Sample a little databases of size N=100+1+1.

2. If the 101th type is a new one in the small database increase

C1=C1+1

3. Check if the 101th is a new type equal to the 102th. C2=C2+1

4. Repeat steps 1-3 M=10,000 times.

P1=C1/M, P2=C2/M,

distribution of over many replications of small databases (size N=100) sampled from a bigger one (size N=12,727) which we pretend is the population.

And from which we obtain a value for 2.603:

Page 15: The fundamental problem of Forensic Statistics

We sample 1000 databases of size 100 from the big one, and for each we calculate the estimate :

Performance of the GG-method

We know .

Page 16: The fundamental problem of Forensic Statistics

We know .

We sample 1000 databases of size 100 from the big one, and for each we calculate the estimate :

Performance of the GG-method

Page 17: The fundamental problem of Forensic Statistics

How well estimates the true (unknown) ?

distribution over many replications of small databases (size N=100) and new haplotype sampled from a bigger one (size N=12,727).

For each database sampled, the true frequency of the new haplotype h is taken equal to its frequency in the big database.

The estimated frequency is calculated using the Discrete Laplace method with default options (iterations, init_y …).

We calculate the distribution of and for each

database and new haplotype sampled.

Page 18: The fundamental problem of Forensic Statistics

Performance of the DiscLap-method

Comparing the distribution of

Page 19: The fundamental problem of Forensic Statistics

0 200 400 600 800 1000

02

46

Index

log1

0(R

atio_

An

de

rse

n)

Comparing the errors of the two methods

DiscLap-method GG-method

0 200 400 600 800 1000

02

46

Index

log10(R

atio

_G

ill)

Page 20: The fundamental problem of Forensic Statistics

−1

01

23

45

6

log1

0(R

atio_

An

de

rse

n)

−1

01

23

45

6

log

10(R

atio

_G

ill)

Comparing the errors of the two methods

DiscLap-method GG-method

Page 21: The fundamental problem of Forensic Statistics

Remarks

Two more levels of uncertainty:

• whether or not the model M that we are assuming for Pr is “correct enough”

• whether or not parameters of Pr in the model M are “correct enough”

Basic uncertainty: • whether or not the trace comes from the

suspect

Page 22: The fundamental problem of Forensic Statistics

Maybe DiscLap was never intended it to be used for such small databases.

Maybe DiscLap does better for our purpose when used in more clever (targeted for our purpose) ways.

The error in the DiscLap method is given by two levels of uncertainty:• Population vs DiscLap• Parameter estimation (within Disclap)

The GG is a “model-free” method which thus has only one level of uncertainty.

Page 23: The fundamental problem of Forensic Statistics

Conclusions

• The situation is more complex than it appears.

• Using more information less accurate LR.

• Assuming less gives more reliable LR.

Page 24: The fundamental problem of Forensic Statistics

References

Page 25: The fundamental problem of Forensic Statistics

You want to discuss? Know more?Collaborate? Give suggestions?

You are welcome!

[email protected]