Populations and Statistics in Forensic...

55
Linköping University Medical Dissertations No. 1175 Populations and Statistics in Forensic Genetics Andreas Tillmar Department of Clinical and Experimental Medicine, Faculty of Health Sciences, Linköping University, SE-581 85 Linköping Sweden Linköping 2010

Transcript of Populations and Statistics in Forensic...

Page 1: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Linkoumlping University Medical Dissertations No 1175

Populations and Statistics in Forensic Genetics

Andreas Tillmar

Department of Clinical and Experimental Medicine Faculty of Health Sciences Linkoumlping University SE-581 85 Linkoumlping Sweden

Linkoumlping 2010

Supervisors Bertil Lindblom Professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Gunilla Holmlund Associate professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Petter Mostad Associate professor Department of Mathematical Sciences Chalmers University of Technology and Univer-sity of Gothenburg Sweden

Faculty opponent Peter de Knijff Professor Department of Human and Clinical Genetics Leiden University Medical Center The Netherlands

Examination board Xiao-Feng Sun Professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Peter Soumlderkvist Professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Elja Arjas Professor Department of Mathematics and Statistics University of Helsinki Finland

John Carstensen Professor (as an alternative) Department of Medicine and Health Sciences Faculty of Health Science Linkoumlping University Sweden

copy Andreas Tillmar 2010 Andreas Tillmar was formerly known as Andreas Karlsson (changed July 2008)

Printed by LiU-Tryck Linkoumlping 2010 ISBN 978-91-7393-420-6 ISSN 0345-0082

To Ida and Signe

Abstract

DNA has become a powerful forensic tool for solving cases such as linking a suspect to a crime scene resolving biological relationship issues and identifying disaster victims Traditionally DNA investigations mainly involve two steps the establishment of DNA profiles from biological samples and the interpreta-tion of the evidential weight given by theses DNA profiles This thesis deals with the latter with focus on models for assessing the weight of evidence and the study of parameters affecting these probability figures

In order to calculate the correct representative weight of DNA evidence prior knowledge about the DNA markers for a relevant population sample is required Important properties that should be studied are for example how frequently certain DNA-variants (ie alleles) occur in the population the differ-ences in such frequencies between subpopulations expected inheritance pat-terns of the DNA markers within a family and the forensic efficiency of the DNA markers in casework

In this thesis we aimed to study important population genetic parameters that influence the weight of evidence given by a DNA-analysis as well as mod-els for proper consideration of such parameters when calculating the weight of evidence in relationship testing

We have established a Swedish frequency database for mitochondrial DNA haplotypes and a haplotype frequency database for markers located on the X-chromosome Furthermore mtDNA haplotype frequencies were used to study the genetic variation within Sweden and between Swedish and other European populations No genetic substructure was found in Sweden but strong similari-ties with other western European populations were observed

Genetic properties such as linkage and linkage disequilibrium could be important when using X-chromosomal markers in relationship testing This was true for the set of markers that we studied In order to account for these prop-erties we proposed a model for how to take linkage and linkage disequilibrium into account when calculating the weight of evidence provided by X-chromosomal analysis

Finally we investigated the risk of erroneous decisions when using DNA in-vestigations for family reunification We showed that the risk is increased due to uncertainties regarding population allele frequencies consanguinity and competing close relationship between the tested individuals Additional infor-

mation and the use of a refined model for the alternative hypotheses reduced the risk of making erroneous decisions

In summary as a result of the work on this thesis we can use mitochondrial DNA and X-chromosome markers in order to resolve complex relationship investigations Moreover the reliability of likelihood estimates has been in-creased by the development of models and the study of relevant parameters affecting probability calculations

Populaumlrvetenskaplig sammanfattning

DNA har blivit ett viktigt verktyg inom raumlttsvaumlsendet foumlr att kunna loumlsa fraringge-staumlllningar liknande dom som att kunna koppla en misstaumlnkt gaumlrningsman till en brottplats utreda fraringgor roumlrande biologiska slaumlktskap eller identifiera offer vid masskatastrofer Man kan dela in en DNA-utredning i tvaring steg dels framtagan-det av DNA-profiler dels en vaumlrdering av vilken betydelse de erharingllna DNA-profilerna har utifraringn en given fraringgestaumlllning Mer specifikt man vill t ex veta sannolikheten foumlr att naringgon annan aumln den misstaumlnkte har laumlmnat DNA paring brottsplatsen eller sannolikheten foumlr att den utpekade mannen aumlr barnets far jaumlmfoumlrt med att han slumpmaumlssigt passar in Beraumlkningen av saringdana sannolikhe-ter baseras bl a paring olika DNA-varianters (alleler) foumlrekomst i befolkningen och hur de genetiska markoumlrerna nedaumlrvs fraringn en generation till en annan

Denna avhandling har syftat till att studera relevanta bakgrundsdata som paring-verkar sannolikhetsberaumlkningarna samt studera matematiska modeller foumlr att paring ett korrekt saumltt ta haumlnsyn till de studerade parametrarna vid fallspecifika sannolikhetsberaumlkningar Avhandlingsarbetet fokuserar paring den genetiska variationen i en svensk befolkning och paring DNA-undersoumlkningar som gaumlller fraringgor roumlrande biologiskt slaumlktskap mellan individer Detta till trots saring aumlr de flesta resultaten och diskussionerna aumlven giltiga vid anvaumlndandet av DNA-profilering i brottsplatsundersoumlkningar

Sannolikhetsberaumlkningar i slaumlktskapsutredningar genomfoumlrs baumlst genom att man staumlller tvaring hypoteser mot varandra Som ett exempel kan man ta en fa-derskapsundersoumlkning daumlr man har DNA-profiler fraringn ett barn barnets mor och en utpekad man I detta fall kan man staumllla hypotes 1 rdquoUtpekad man aumlr far till barnetrdquo mot hypotes 2 rdquoUtpekad man aumlr obeslaumlktad med barnetrdquo Foumlr varje hypotes beraumlknas sedan sannolikheten foumlr att se de DNA resultat som erharingllits under foumlrutsaumlttning att hypotesen aumlr sann Tex sannolikheten foumlr moderns barnets och mannens DNA-profiler naumlr den utpekade mannen aumlr barnets far respektive naumlr den utpekade mannen inte aumlr barnets far Det slutgiltiga vaumlrdet av undersoumlkningen farings genom att vikta de baringda sannolikheterna mot varandra Ett beslut huruvida mannen aumlr barnets far eller inte kan baseras paring resultatet av sannolikhetsberaumlkningen i jaumlmfoumlrelse med ett graumlnsvaumlrde foumlr inklusion alterna-tivt uteslutning

Det finns alltid en risk att dra fel slutsats T ex att man felaktigt utesluter en biologisk far som fadern eller att man felaktigt inkluderar en icke-far som den biologiska fadern I varingrt foumlrsta delarbete undersoumlkte vi risken att dra fel slutsats

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 2: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Supervisors Bertil Lindblom Professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Gunilla Holmlund Associate professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Petter Mostad Associate professor Department of Mathematical Sciences Chalmers University of Technology and Univer-sity of Gothenburg Sweden

Faculty opponent Peter de Knijff Professor Department of Human and Clinical Genetics Leiden University Medical Center The Netherlands

Examination board Xiao-Feng Sun Professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Peter Soumlderkvist Professor Department of Clinical and Experimental Medicine Faculty of Health Science Linkoumlping University Sweden

Elja Arjas Professor Department of Mathematics and Statistics University of Helsinki Finland

John Carstensen Professor (as an alternative) Department of Medicine and Health Sciences Faculty of Health Science Linkoumlping University Sweden

copy Andreas Tillmar 2010 Andreas Tillmar was formerly known as Andreas Karlsson (changed July 2008)

Printed by LiU-Tryck Linkoumlping 2010 ISBN 978-91-7393-420-6 ISSN 0345-0082

To Ida and Signe

Abstract

DNA has become a powerful forensic tool for solving cases such as linking a suspect to a crime scene resolving biological relationship issues and identifying disaster victims Traditionally DNA investigations mainly involve two steps the establishment of DNA profiles from biological samples and the interpreta-tion of the evidential weight given by theses DNA profiles This thesis deals with the latter with focus on models for assessing the weight of evidence and the study of parameters affecting these probability figures

In order to calculate the correct representative weight of DNA evidence prior knowledge about the DNA markers for a relevant population sample is required Important properties that should be studied are for example how frequently certain DNA-variants (ie alleles) occur in the population the differ-ences in such frequencies between subpopulations expected inheritance pat-terns of the DNA markers within a family and the forensic efficiency of the DNA markers in casework

In this thesis we aimed to study important population genetic parameters that influence the weight of evidence given by a DNA-analysis as well as mod-els for proper consideration of such parameters when calculating the weight of evidence in relationship testing

We have established a Swedish frequency database for mitochondrial DNA haplotypes and a haplotype frequency database for markers located on the X-chromosome Furthermore mtDNA haplotype frequencies were used to study the genetic variation within Sweden and between Swedish and other European populations No genetic substructure was found in Sweden but strong similari-ties with other western European populations were observed

Genetic properties such as linkage and linkage disequilibrium could be important when using X-chromosomal markers in relationship testing This was true for the set of markers that we studied In order to account for these prop-erties we proposed a model for how to take linkage and linkage disequilibrium into account when calculating the weight of evidence provided by X-chromosomal analysis

Finally we investigated the risk of erroneous decisions when using DNA in-vestigations for family reunification We showed that the risk is increased due to uncertainties regarding population allele frequencies consanguinity and competing close relationship between the tested individuals Additional infor-

mation and the use of a refined model for the alternative hypotheses reduced the risk of making erroneous decisions

In summary as a result of the work on this thesis we can use mitochondrial DNA and X-chromosome markers in order to resolve complex relationship investigations Moreover the reliability of likelihood estimates has been in-creased by the development of models and the study of relevant parameters affecting probability calculations

Populaumlrvetenskaplig sammanfattning

DNA har blivit ett viktigt verktyg inom raumlttsvaumlsendet foumlr att kunna loumlsa fraringge-staumlllningar liknande dom som att kunna koppla en misstaumlnkt gaumlrningsman till en brottplats utreda fraringgor roumlrande biologiska slaumlktskap eller identifiera offer vid masskatastrofer Man kan dela in en DNA-utredning i tvaring steg dels framtagan-det av DNA-profiler dels en vaumlrdering av vilken betydelse de erharingllna DNA-profilerna har utifraringn en given fraringgestaumlllning Mer specifikt man vill t ex veta sannolikheten foumlr att naringgon annan aumln den misstaumlnkte har laumlmnat DNA paring brottsplatsen eller sannolikheten foumlr att den utpekade mannen aumlr barnets far jaumlmfoumlrt med att han slumpmaumlssigt passar in Beraumlkningen av saringdana sannolikhe-ter baseras bl a paring olika DNA-varianters (alleler) foumlrekomst i befolkningen och hur de genetiska markoumlrerna nedaumlrvs fraringn en generation till en annan

Denna avhandling har syftat till att studera relevanta bakgrundsdata som paring-verkar sannolikhetsberaumlkningarna samt studera matematiska modeller foumlr att paring ett korrekt saumltt ta haumlnsyn till de studerade parametrarna vid fallspecifika sannolikhetsberaumlkningar Avhandlingsarbetet fokuserar paring den genetiska variationen i en svensk befolkning och paring DNA-undersoumlkningar som gaumlller fraringgor roumlrande biologiskt slaumlktskap mellan individer Detta till trots saring aumlr de flesta resultaten och diskussionerna aumlven giltiga vid anvaumlndandet av DNA-profilering i brottsplatsundersoumlkningar

Sannolikhetsberaumlkningar i slaumlktskapsutredningar genomfoumlrs baumlst genom att man staumlller tvaring hypoteser mot varandra Som ett exempel kan man ta en fa-derskapsundersoumlkning daumlr man har DNA-profiler fraringn ett barn barnets mor och en utpekad man I detta fall kan man staumllla hypotes 1 rdquoUtpekad man aumlr far till barnetrdquo mot hypotes 2 rdquoUtpekad man aumlr obeslaumlktad med barnetrdquo Foumlr varje hypotes beraumlknas sedan sannolikheten foumlr att se de DNA resultat som erharingllits under foumlrutsaumlttning att hypotesen aumlr sann Tex sannolikheten foumlr moderns barnets och mannens DNA-profiler naumlr den utpekade mannen aumlr barnets far respektive naumlr den utpekade mannen inte aumlr barnets far Det slutgiltiga vaumlrdet av undersoumlkningen farings genom att vikta de baringda sannolikheterna mot varandra Ett beslut huruvida mannen aumlr barnets far eller inte kan baseras paring resultatet av sannolikhetsberaumlkningen i jaumlmfoumlrelse med ett graumlnsvaumlrde foumlr inklusion alterna-tivt uteslutning

Det finns alltid en risk att dra fel slutsats T ex att man felaktigt utesluter en biologisk far som fadern eller att man felaktigt inkluderar en icke-far som den biologiska fadern I varingrt foumlrsta delarbete undersoumlkte vi risken att dra fel slutsats

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 3: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

To Ida and Signe

Abstract

DNA has become a powerful forensic tool for solving cases such as linking a suspect to a crime scene resolving biological relationship issues and identifying disaster victims Traditionally DNA investigations mainly involve two steps the establishment of DNA profiles from biological samples and the interpreta-tion of the evidential weight given by theses DNA profiles This thesis deals with the latter with focus on models for assessing the weight of evidence and the study of parameters affecting these probability figures

In order to calculate the correct representative weight of DNA evidence prior knowledge about the DNA markers for a relevant population sample is required Important properties that should be studied are for example how frequently certain DNA-variants (ie alleles) occur in the population the differ-ences in such frequencies between subpopulations expected inheritance pat-terns of the DNA markers within a family and the forensic efficiency of the DNA markers in casework

In this thesis we aimed to study important population genetic parameters that influence the weight of evidence given by a DNA-analysis as well as mod-els for proper consideration of such parameters when calculating the weight of evidence in relationship testing

We have established a Swedish frequency database for mitochondrial DNA haplotypes and a haplotype frequency database for markers located on the X-chromosome Furthermore mtDNA haplotype frequencies were used to study the genetic variation within Sweden and between Swedish and other European populations No genetic substructure was found in Sweden but strong similari-ties with other western European populations were observed

Genetic properties such as linkage and linkage disequilibrium could be important when using X-chromosomal markers in relationship testing This was true for the set of markers that we studied In order to account for these prop-erties we proposed a model for how to take linkage and linkage disequilibrium into account when calculating the weight of evidence provided by X-chromosomal analysis

Finally we investigated the risk of erroneous decisions when using DNA in-vestigations for family reunification We showed that the risk is increased due to uncertainties regarding population allele frequencies consanguinity and competing close relationship between the tested individuals Additional infor-

mation and the use of a refined model for the alternative hypotheses reduced the risk of making erroneous decisions

In summary as a result of the work on this thesis we can use mitochondrial DNA and X-chromosome markers in order to resolve complex relationship investigations Moreover the reliability of likelihood estimates has been in-creased by the development of models and the study of relevant parameters affecting probability calculations

Populaumlrvetenskaplig sammanfattning

DNA har blivit ett viktigt verktyg inom raumlttsvaumlsendet foumlr att kunna loumlsa fraringge-staumlllningar liknande dom som att kunna koppla en misstaumlnkt gaumlrningsman till en brottplats utreda fraringgor roumlrande biologiska slaumlktskap eller identifiera offer vid masskatastrofer Man kan dela in en DNA-utredning i tvaring steg dels framtagan-det av DNA-profiler dels en vaumlrdering av vilken betydelse de erharingllna DNA-profilerna har utifraringn en given fraringgestaumlllning Mer specifikt man vill t ex veta sannolikheten foumlr att naringgon annan aumln den misstaumlnkte har laumlmnat DNA paring brottsplatsen eller sannolikheten foumlr att den utpekade mannen aumlr barnets far jaumlmfoumlrt med att han slumpmaumlssigt passar in Beraumlkningen av saringdana sannolikhe-ter baseras bl a paring olika DNA-varianters (alleler) foumlrekomst i befolkningen och hur de genetiska markoumlrerna nedaumlrvs fraringn en generation till en annan

Denna avhandling har syftat till att studera relevanta bakgrundsdata som paring-verkar sannolikhetsberaumlkningarna samt studera matematiska modeller foumlr att paring ett korrekt saumltt ta haumlnsyn till de studerade parametrarna vid fallspecifika sannolikhetsberaumlkningar Avhandlingsarbetet fokuserar paring den genetiska variationen i en svensk befolkning och paring DNA-undersoumlkningar som gaumlller fraringgor roumlrande biologiskt slaumlktskap mellan individer Detta till trots saring aumlr de flesta resultaten och diskussionerna aumlven giltiga vid anvaumlndandet av DNA-profilering i brottsplatsundersoumlkningar

Sannolikhetsberaumlkningar i slaumlktskapsutredningar genomfoumlrs baumlst genom att man staumlller tvaring hypoteser mot varandra Som ett exempel kan man ta en fa-derskapsundersoumlkning daumlr man har DNA-profiler fraringn ett barn barnets mor och en utpekad man I detta fall kan man staumllla hypotes 1 rdquoUtpekad man aumlr far till barnetrdquo mot hypotes 2 rdquoUtpekad man aumlr obeslaumlktad med barnetrdquo Foumlr varje hypotes beraumlknas sedan sannolikheten foumlr att se de DNA resultat som erharingllits under foumlrutsaumlttning att hypotesen aumlr sann Tex sannolikheten foumlr moderns barnets och mannens DNA-profiler naumlr den utpekade mannen aumlr barnets far respektive naumlr den utpekade mannen inte aumlr barnets far Det slutgiltiga vaumlrdet av undersoumlkningen farings genom att vikta de baringda sannolikheterna mot varandra Ett beslut huruvida mannen aumlr barnets far eller inte kan baseras paring resultatet av sannolikhetsberaumlkningen i jaumlmfoumlrelse med ett graumlnsvaumlrde foumlr inklusion alterna-tivt uteslutning

Det finns alltid en risk att dra fel slutsats T ex att man felaktigt utesluter en biologisk far som fadern eller att man felaktigt inkluderar en icke-far som den biologiska fadern I varingrt foumlrsta delarbete undersoumlkte vi risken att dra fel slutsats

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 4: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Abstract

DNA has become a powerful forensic tool for solving cases such as linking a suspect to a crime scene resolving biological relationship issues and identifying disaster victims Traditionally DNA investigations mainly involve two steps the establishment of DNA profiles from biological samples and the interpreta-tion of the evidential weight given by theses DNA profiles This thesis deals with the latter with focus on models for assessing the weight of evidence and the study of parameters affecting these probability figures

In order to calculate the correct representative weight of DNA evidence prior knowledge about the DNA markers for a relevant population sample is required Important properties that should be studied are for example how frequently certain DNA-variants (ie alleles) occur in the population the differ-ences in such frequencies between subpopulations expected inheritance pat-terns of the DNA markers within a family and the forensic efficiency of the DNA markers in casework

In this thesis we aimed to study important population genetic parameters that influence the weight of evidence given by a DNA-analysis as well as mod-els for proper consideration of such parameters when calculating the weight of evidence in relationship testing

We have established a Swedish frequency database for mitochondrial DNA haplotypes and a haplotype frequency database for markers located on the X-chromosome Furthermore mtDNA haplotype frequencies were used to study the genetic variation within Sweden and between Swedish and other European populations No genetic substructure was found in Sweden but strong similari-ties with other western European populations were observed

Genetic properties such as linkage and linkage disequilibrium could be important when using X-chromosomal markers in relationship testing This was true for the set of markers that we studied In order to account for these prop-erties we proposed a model for how to take linkage and linkage disequilibrium into account when calculating the weight of evidence provided by X-chromosomal analysis

Finally we investigated the risk of erroneous decisions when using DNA in-vestigations for family reunification We showed that the risk is increased due to uncertainties regarding population allele frequencies consanguinity and competing close relationship between the tested individuals Additional infor-

mation and the use of a refined model for the alternative hypotheses reduced the risk of making erroneous decisions

In summary as a result of the work on this thesis we can use mitochondrial DNA and X-chromosome markers in order to resolve complex relationship investigations Moreover the reliability of likelihood estimates has been in-creased by the development of models and the study of relevant parameters affecting probability calculations

Populaumlrvetenskaplig sammanfattning

DNA har blivit ett viktigt verktyg inom raumlttsvaumlsendet foumlr att kunna loumlsa fraringge-staumlllningar liknande dom som att kunna koppla en misstaumlnkt gaumlrningsman till en brottplats utreda fraringgor roumlrande biologiska slaumlktskap eller identifiera offer vid masskatastrofer Man kan dela in en DNA-utredning i tvaring steg dels framtagan-det av DNA-profiler dels en vaumlrdering av vilken betydelse de erharingllna DNA-profilerna har utifraringn en given fraringgestaumlllning Mer specifikt man vill t ex veta sannolikheten foumlr att naringgon annan aumln den misstaumlnkte har laumlmnat DNA paring brottsplatsen eller sannolikheten foumlr att den utpekade mannen aumlr barnets far jaumlmfoumlrt med att han slumpmaumlssigt passar in Beraumlkningen av saringdana sannolikhe-ter baseras bl a paring olika DNA-varianters (alleler) foumlrekomst i befolkningen och hur de genetiska markoumlrerna nedaumlrvs fraringn en generation till en annan

Denna avhandling har syftat till att studera relevanta bakgrundsdata som paring-verkar sannolikhetsberaumlkningarna samt studera matematiska modeller foumlr att paring ett korrekt saumltt ta haumlnsyn till de studerade parametrarna vid fallspecifika sannolikhetsberaumlkningar Avhandlingsarbetet fokuserar paring den genetiska variationen i en svensk befolkning och paring DNA-undersoumlkningar som gaumlller fraringgor roumlrande biologiskt slaumlktskap mellan individer Detta till trots saring aumlr de flesta resultaten och diskussionerna aumlven giltiga vid anvaumlndandet av DNA-profilering i brottsplatsundersoumlkningar

Sannolikhetsberaumlkningar i slaumlktskapsutredningar genomfoumlrs baumlst genom att man staumlller tvaring hypoteser mot varandra Som ett exempel kan man ta en fa-derskapsundersoumlkning daumlr man har DNA-profiler fraringn ett barn barnets mor och en utpekad man I detta fall kan man staumllla hypotes 1 rdquoUtpekad man aumlr far till barnetrdquo mot hypotes 2 rdquoUtpekad man aumlr obeslaumlktad med barnetrdquo Foumlr varje hypotes beraumlknas sedan sannolikheten foumlr att se de DNA resultat som erharingllits under foumlrutsaumlttning att hypotesen aumlr sann Tex sannolikheten foumlr moderns barnets och mannens DNA-profiler naumlr den utpekade mannen aumlr barnets far respektive naumlr den utpekade mannen inte aumlr barnets far Det slutgiltiga vaumlrdet av undersoumlkningen farings genom att vikta de baringda sannolikheterna mot varandra Ett beslut huruvida mannen aumlr barnets far eller inte kan baseras paring resultatet av sannolikhetsberaumlkningen i jaumlmfoumlrelse med ett graumlnsvaumlrde foumlr inklusion alterna-tivt uteslutning

Det finns alltid en risk att dra fel slutsats T ex att man felaktigt utesluter en biologisk far som fadern eller att man felaktigt inkluderar en icke-far som den biologiska fadern I varingrt foumlrsta delarbete undersoumlkte vi risken att dra fel slutsats

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 5: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

mation and the use of a refined model for the alternative hypotheses reduced the risk of making erroneous decisions

In summary as a result of the work on this thesis we can use mitochondrial DNA and X-chromosome markers in order to resolve complex relationship investigations Moreover the reliability of likelihood estimates has been in-creased by the development of models and the study of relevant parameters affecting probability calculations

Populaumlrvetenskaplig sammanfattning

DNA har blivit ett viktigt verktyg inom raumlttsvaumlsendet foumlr att kunna loumlsa fraringge-staumlllningar liknande dom som att kunna koppla en misstaumlnkt gaumlrningsman till en brottplats utreda fraringgor roumlrande biologiska slaumlktskap eller identifiera offer vid masskatastrofer Man kan dela in en DNA-utredning i tvaring steg dels framtagan-det av DNA-profiler dels en vaumlrdering av vilken betydelse de erharingllna DNA-profilerna har utifraringn en given fraringgestaumlllning Mer specifikt man vill t ex veta sannolikheten foumlr att naringgon annan aumln den misstaumlnkte har laumlmnat DNA paring brottsplatsen eller sannolikheten foumlr att den utpekade mannen aumlr barnets far jaumlmfoumlrt med att han slumpmaumlssigt passar in Beraumlkningen av saringdana sannolikhe-ter baseras bl a paring olika DNA-varianters (alleler) foumlrekomst i befolkningen och hur de genetiska markoumlrerna nedaumlrvs fraringn en generation till en annan

Denna avhandling har syftat till att studera relevanta bakgrundsdata som paring-verkar sannolikhetsberaumlkningarna samt studera matematiska modeller foumlr att paring ett korrekt saumltt ta haumlnsyn till de studerade parametrarna vid fallspecifika sannolikhetsberaumlkningar Avhandlingsarbetet fokuserar paring den genetiska variationen i en svensk befolkning och paring DNA-undersoumlkningar som gaumlller fraringgor roumlrande biologiskt slaumlktskap mellan individer Detta till trots saring aumlr de flesta resultaten och diskussionerna aumlven giltiga vid anvaumlndandet av DNA-profilering i brottsplatsundersoumlkningar

Sannolikhetsberaumlkningar i slaumlktskapsutredningar genomfoumlrs baumlst genom att man staumlller tvaring hypoteser mot varandra Som ett exempel kan man ta en fa-derskapsundersoumlkning daumlr man har DNA-profiler fraringn ett barn barnets mor och en utpekad man I detta fall kan man staumllla hypotes 1 rdquoUtpekad man aumlr far till barnetrdquo mot hypotes 2 rdquoUtpekad man aumlr obeslaumlktad med barnetrdquo Foumlr varje hypotes beraumlknas sedan sannolikheten foumlr att se de DNA resultat som erharingllits under foumlrutsaumlttning att hypotesen aumlr sann Tex sannolikheten foumlr moderns barnets och mannens DNA-profiler naumlr den utpekade mannen aumlr barnets far respektive naumlr den utpekade mannen inte aumlr barnets far Det slutgiltiga vaumlrdet av undersoumlkningen farings genom att vikta de baringda sannolikheterna mot varandra Ett beslut huruvida mannen aumlr barnets far eller inte kan baseras paring resultatet av sannolikhetsberaumlkningen i jaumlmfoumlrelse med ett graumlnsvaumlrde foumlr inklusion alterna-tivt uteslutning

Det finns alltid en risk att dra fel slutsats T ex att man felaktigt utesluter en biologisk far som fadern eller att man felaktigt inkluderar en icke-far som den biologiska fadern I varingrt foumlrsta delarbete undersoumlkte vi risken att dra fel slutsats

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 6: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Populaumlrvetenskaplig sammanfattning

DNA har blivit ett viktigt verktyg inom raumlttsvaumlsendet foumlr att kunna loumlsa fraringge-staumlllningar liknande dom som att kunna koppla en misstaumlnkt gaumlrningsman till en brottplats utreda fraringgor roumlrande biologiska slaumlktskap eller identifiera offer vid masskatastrofer Man kan dela in en DNA-utredning i tvaring steg dels framtagan-det av DNA-profiler dels en vaumlrdering av vilken betydelse de erharingllna DNA-profilerna har utifraringn en given fraringgestaumlllning Mer specifikt man vill t ex veta sannolikheten foumlr att naringgon annan aumln den misstaumlnkte har laumlmnat DNA paring brottsplatsen eller sannolikheten foumlr att den utpekade mannen aumlr barnets far jaumlmfoumlrt med att han slumpmaumlssigt passar in Beraumlkningen av saringdana sannolikhe-ter baseras bl a paring olika DNA-varianters (alleler) foumlrekomst i befolkningen och hur de genetiska markoumlrerna nedaumlrvs fraringn en generation till en annan

Denna avhandling har syftat till att studera relevanta bakgrundsdata som paring-verkar sannolikhetsberaumlkningarna samt studera matematiska modeller foumlr att paring ett korrekt saumltt ta haumlnsyn till de studerade parametrarna vid fallspecifika sannolikhetsberaumlkningar Avhandlingsarbetet fokuserar paring den genetiska variationen i en svensk befolkning och paring DNA-undersoumlkningar som gaumlller fraringgor roumlrande biologiskt slaumlktskap mellan individer Detta till trots saring aumlr de flesta resultaten och diskussionerna aumlven giltiga vid anvaumlndandet av DNA-profilering i brottsplatsundersoumlkningar

Sannolikhetsberaumlkningar i slaumlktskapsutredningar genomfoumlrs baumlst genom att man staumlller tvaring hypoteser mot varandra Som ett exempel kan man ta en fa-derskapsundersoumlkning daumlr man har DNA-profiler fraringn ett barn barnets mor och en utpekad man I detta fall kan man staumllla hypotes 1 rdquoUtpekad man aumlr far till barnetrdquo mot hypotes 2 rdquoUtpekad man aumlr obeslaumlktad med barnetrdquo Foumlr varje hypotes beraumlknas sedan sannolikheten foumlr att se de DNA resultat som erharingllits under foumlrutsaumlttning att hypotesen aumlr sann Tex sannolikheten foumlr moderns barnets och mannens DNA-profiler naumlr den utpekade mannen aumlr barnets far respektive naumlr den utpekade mannen inte aumlr barnets far Det slutgiltiga vaumlrdet av undersoumlkningen farings genom att vikta de baringda sannolikheterna mot varandra Ett beslut huruvida mannen aumlr barnets far eller inte kan baseras paring resultatet av sannolikhetsberaumlkningen i jaumlmfoumlrelse med ett graumlnsvaumlrde foumlr inklusion alterna-tivt uteslutning

Det finns alltid en risk att dra fel slutsats T ex att man felaktigt utesluter en biologisk far som fadern eller att man felaktigt inkluderar en icke-far som den biologiska fadern I varingrt foumlrsta delarbete undersoumlkte vi risken att dra fel slutsats

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 7: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

samt studerade betydelsen och inverkan av olika faktorer som kan paringverka detta Vi fokuserade paring DNA-utredningar i familjearingterfoumlreningsaumlrenden vilka kan vara komplexa daring de innefattar osaumlkerheter kring populationstillhoumlrighet skillnader i familjekonstellationer etc Genom simuleringar visade vi att felen kan minimeras om man oumlkar undersoumlkningens informationsgrad tex genom att anvaumlnda fler DNA-markoumlrer DNA-profiler fraringn fler individer samt allel-frekvensdata fraringn samma population Dessutom visade vi att det garingr att minska risken foumlr fel ytterliggare genom att man anvaumlnder sig av en foumlrfinad metod foumlr att kunna ta haumlnsyn till alternativa naumlrbeslaumlktade slaumlktskap mellan de testade individerna

I standardutredningar anvaumlnds DNA-markoumlrer belaumlgna paring de sk autosoma-la kromosomerna Foumlr specialfall kan man aumlven undersoumlka DNA-variationer som finns paring mitokondrien (mtDNA) eller paring koumlnskromosomerna (X-kromosomen och Y-kromosomen) MtDNA aumlrvs paring moumldernet och aumlr speciellt anvaumlndbart foumlr utredning vid foumlrmodat maternellt slaumlktskap I delarbete tvaring undersoumlkte vi mtDNA variationen i en svensk population i syfte att skapa en frekvensdatabas Genom att analysera blodprover fraringn ca 300 svenskar fraringn sju geografiskt skilda regioner kunde vi visa att informationsgraden foumlr anvaumlndning i en svensk population aumlr jaumlmfoumlrbar med andra europeiska populationer Dess-utom visade vi i studien att det inte finns naringgra signifikanta skillnader mellan mtDNA variationen i de olika svenska regionerna

Delarbete tre och fyra fokuserade paring den DNA-variation som finns paring X-kromosomen Tack vare X-kromosomens speciella nedaumlrvningsmoumlnster kan en X-kromosomanalys ge en loumlsning i komplexa slaumlktutredningar daumlr analys av standard DNA-markoumlrer inte raumlcker till Anvaumlndandet av X-kromosomen i slaumlktskapsutredningar kraumlver dock att man tar speciell haumlnsyn till tvaring genetiska egenskaper som kallas koppling och kopplingsojaumlmvikt Koppling kan foumlrklaras med att sannolikheten foumlr att aumlrva en viss variant foumlr en DNA- markoumlr paringverkas av vilken DNA-variant man har aumlrvt i en annan naumlrbelaumlgen DNA-markoumlr I delarbete tre undersoumlkte vi den genetiska polymorfin foumlr aringtta DNA-markoumlrer som alla aumlr belaumlgna paring X-kromosomen Vi visade att informationsgraden foumlr markoumlrernas anvaumlndbarhet i slaumlktskapsutredningar aumlr houmlg och att det finns en kopplingsojaumlmvikt som har betydelse vid frekvensuppskattningen av olika kombinationer av DNA-varianter

Slutligen i delarbete fyra tog vi fram en matematisk beraumlkningsmodell foumlr att korrekt ta haumlnsyn till baringde koppling och kopplingsojaumlmvikt vid sannolikhetsbe-raumlkningar i slaumlktskapsutredningar baserade paring X-kromosomdata Vi applicerade denna beraumlkningsmodell i en simuleringsstudie paring ett antal typfall och visade paring graden av fel om man anvaumlnder en enklare beraumlkningsmodell daumlr ingen haumlnsyn till koppling eller kopplingsojaumlmvik tas

Sammanfattningsvis i och med arbetena i denna avhandling saring kan vi an-vaumlnda mitokondriellt DNA och X-kromosomala DNA-markoumlrer foumlr att loumlsa mer komplexa slaumlktskapsutredningar Genom framtagandet av modeller och

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 8: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

studie av relevanta parametrar som paringverkar slaumlktskapssannolikhetsberaumlkningen har tillfoumlrlitigheten i de beraumlknande sannolikheterna kunnat oumlkas

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 9: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

List of Papers

This thesis is based on the following papers which are referred to in the text by their Roman numerals

I DNA-testing for immigration cases the risk of erroneous conclu-

sions Karlsson AO Holmlund G Egeland T Mostad P Forensic Sci Int 2007 172(2-3)144-149

II Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations Tillmar AO Coble MD Wallerstroumlm T Holmlund G Int J Legal Med 2010 124(2)91-98

III Analysis of linkage and linkage disequilibrium for eight X-STR markers Tillmar AO Mostad P Egeland T Lindblom B Holm-lund G Montelius K Forensic Sci Int Genet 2008 3(1)37-41

IV Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account Tillmar AO Egeland T Lindblom B Holmlund G Mostad P Int J Legal Med 2010 submitted

Reprints were made with permission from the respective publishers Paper I copy 2007 Elsevier Forensic Science International Paper II copy 2010 Springer International Journal of Legal Medicine Paper III copy 2008 Elsevier Forensic Science International Genetics

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 10: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Contents

Abstract

Populaumlrvetenskaplig sammanfattning

List of Papers

Contents

Abbreviations

Introduction 17 History of DNA and forensic genetics 17 Population genetics 18

Genetic polymorphisms 18 DNA inheritance 20 Population Genetics 22 The Swedish population and its genetic appearance 25

Forensic mathematicsstatistics 26 Framework for interpretation and presentation of evidential weight 26 Paternity index calculation 27 Mathematical model for automatic likelihood computation for relationship testing 29

Aim of the thesis 31 Specific aims 31

Paper I 31 Paper II 31 Paper III 31 Paper IV 31

Investigations 33 Paper I - DNA-testing for immigration cases the risk of erroneous conclusions 33

Materials and methods 33 Results and discussion 34

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations 36

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 11: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Materials and methods 36 Results and discussion 36

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers 38

Materials and methods 38 Results and discussion 38

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account 40

Materials and methods 40 Results and discussion 40

Concluding remarks 43

Future perspectives 45

Acknowledgements 47

References 49

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 12: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Abbreviations

θ Theta recombination frequencyfraction AF Alleged father DNA Deoxyribonucleic acid FST Measure of population genetic subdivision GD Gene diversity HWE Hardy-Weinberg equilibrium ISFG International Society of Forensic Genetics LD Linkage disequilibrium LR Likelihood ratio MEC Mean exclusion chance MtDNA Mitochondrial DNA PCR Polymerase chain reaction PD Power of discrimination PE Power of exclusion PI Paternity index PIC Polymorphism informative content PM Match probability Pr Probability SNP Single nucleotide polymorphism STR Short tandem repeat TF True father

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 13: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

History of DNA and forensic genetics

When Watson amp Crick discovered the structure of the DNA molecule (Wat-son amp Crick 1953) they could probably not imagine the future usefulness of their finding By analysing DNA information about genetic diseases evolution of biological life and population history can be retrieved Nowadays DNA is used in everyday practice for applications within different areas such as medical genetics the food processing industry and in forensic situations when solving crimes as well as in disputes about biological relationships

Traditionally the aim of forensic genetics is to provide a statement about the identity of a human being based on a biological sample by means of a DNA analysis However forensic genetics today covers a wider spectrum of areas such as forensic molecular pathology (Karch 2007) complex traits (Kayser et al 2009 Pulker et al 2007) and wild life forensics (Alacs et al 2010 Budowle et al 2005) When it comes to human identification the task could be to con-nect a suspect to the crime scene or investigate a biological relationship (Jobling et al 2004b)

The first time DNA was used in court for a crime scene sample was in 1986 in the UK (Gill et al 1987) The case involved the exclusion of a mur-der suspect using multi locus DNA-probes (Jeffreys et al 1985) Since then the techniques and methodologies of employing the information provided by DNA have undergone enormous improvements making it an obvious tool for routine practice when dealing with forensic issues Perhaps the most famous case when the use of DNA was put under pressure and from which lessons still can be learned was the trial of OJ Simpson (Lee amp Labriola 2001) This trial is a good example of the importance of the complete process from han-dling evidential biological samples at the crime scene via storage and the estab-lishment of DNA profiles to the presentation of the weight of the evidence provided by the DNA results in court In no other trial has the DNA result been so thoroughly examined discussed and questioned by the defence

Within forensic genetics a DNA investigation always has a question to an-swer For example is the donor of a crime scene sample the same individual as the suspect and Is the alleged father (AF) the biological father of the child When the establishment of DNA profiles is finished they are used for interpre-

| 17

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 14: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

tation of the case specific question Normally three different statements can be presented for any given hypothesis tested exclusion inconclusive or inclusion When no exclusion can be made some sort of statistical evaluation has to be performed in order to estimate the strength of the evidence provided by the DNA profiles Put simply the majority of such cases involve consideration of the probability to see identical DNA profiles from unrelated individuals by coincidence The statistical assessment and presentation of the DNA evidence are crucial for the acceptance of DNA as a routine tool

The establishment of these figures is usually based on the genetic uniqueness of the information that exists in the DNA profile in the context of a relevant population The main aim of the present thesis is to discuss issues that are im-portant for relationship testing but many aspects and parameters studied and discussed here are just as important for evaluating DNA evidence in criminal casework

Two main areas must be studied in order to establish the probability of the evidential weight for a given DNA marker First population genetics including allele frequencies population substructure dependence within and between markers and others Second models for calculating and presenting the weight of evidence taking the former information properly into account

Population genetics Genetic polymorphisms Three different types of DNA marker Short Tandem Repeats (STRs) Single Nucleotide Polymorphisms (SNPs) and DNA sequence data (Figure 1) repre-sent the absolute majority of polymorphisms used in forensic genetic applica-tions They all have characteristics making them especially useful for solving criminal cases and for relationship testing

An STR marker consists of a short DNA sequence (eg GATA) repeated a variable number of times These markers are widespread throughout the ge-nome and account for approximately 3 of the total human genome (Ellegren 2004) They have a relatively high mutation rate which is the reason for their high degree of polymorphism STRs are robust easy to multiplex for PCR am-plification and exhibit high polymorphisms among human populations (Butler 2006) In other words they have good characteristics for use in forensic appli-cations More than 10 alleles exist for the commonly used STRs which gener-ally makes a multi locus STR DNA profile unique In the 1990s the FBI con-centrated on 13 STR markers called CODIS loci (Budowle et al 1999a) These and some additional markers were then adopted and commercialised by a few corporations thus making them the standard set up of markers for use in rou-tine practice Recently developments have taken place in relation to STRs with shorter amplicon sizes (ie miniSTRs) (Wiegand amp Kleiber 2001) These have

18 |

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 15: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

the advantage of increasing the probability of obtaining complete profiles for degraded DNA

Another type of marker is the SNPs which consist of single base polymor-phisms These are often biallelic although there is an increasing interest in tri-allelic SNPs for forensic applications (Westen et al 2009) SNPs have the ad-vantage that short amplicons can be used for the PCR amplification which is particularly important for degraded samples Another feature is the low muta-tion rate which is an advantage in relationship testing The disadvantage how-ever is that since the number of alleles per locus is limited the information content is low The amount of information from one STR marker is the same as from approximately four SNPs (Sobrino et al 2005 Brenner (wwwdna-viewcom)) Regarding SNP multiplexes there is no commercial forensic kit available although some work has taken place and efforts made to develop such multiplexes for use in criminal cases and for relationship testing (Borsting et al 2009 Philips et al 2008)

A third alternative is to use nucleotide sequence variation ie information from a DNA sequence spanning a pre-defined region The main use of se-quence data in forensic situations involves the analysis of variation on the mito-chondrial DNA (mtDNA) No STRs are present on the mtDNA Analysis of mtDNA SNPs in addition to the sequence data has however been shown to increase the total discrimination power (Coble et al 2004)

Figure 1 Illustration of alleles for a STR marker (top) SNP marker (middle) and DNA sequence variation (bottom)

| 19

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 16: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

DNA inheritance In addition to the markers described above there are different ldquotypesrdquo of DNA with different properties in terms of their inheritance pattern as well as other important population genetic properties The types discussed here are markers on the autosomal chromosomes the sex-chromosomes (X-chromosome and Y-chromosome) and the mitochondrial DNA (mtDNA)

For an autosomal locus each individual has two alleles one inherited from the mother and one from the father The traditional use of autosomal markers in forensic relationship testing only provides information on relationships spanning from one to a few generations (Nothnagel et al 2010) However technical improvements have made it possible to simultaneously study hun-dreds of thousands of autosomal markers thus reducing the limitations associ-ated with complex pedigree testing (Egeland et al 2008 Skare et al 2009)

Moving on to the X-chromosome which has different inheritance pattern compared with autosomal markers Females have two copies of the X-chromosome while males normally only have one A consequence of this is that X-chromosomal markers act as autosomal markers in their transmission to gametes in females and as haploid markers in males Females inherit one X-chromosome from their mother and their fatherrsquos only X-chromosome while males inherit their only X-chromosome from the two belonging to their mother In relationship testing X-chromosome analysis is particularly useful in deficiency cases Consider for example a case where two sisters are tested to establish whether or not they have the same father and where DNA profiles are only available for the sisters In such instances autosomal DNA markers cannot exclude paternity since two sisters can inherit different alleles despite being full siblings The use of X-chromosome markers can however exclude paternity since two sisters would share the same paternal allele if they have the same father There are several other types of relationship where analysis of X-chromosomal markers is superior to autosomal markers (Szibor et al 2003 Pinto et al 2010)

The use of the X-chromosome in forensic relationship testing usually in-volves STR markers Detailed information regarding more than 50 X-STRs has been collected (wwwchrx-strorg) and used in different PCR multiplexes (Becker et al 2008 Hundertmark et al 2008 Gomez et al 2007 Diegoli et al 2010) Linkage and linkage disequilibrium must typically be considered when using a combination of closely located X-chromosomal markers in relationship testing (Krawzcak 2007 Szibor 2007) (Figure 2) These two genetic properties are further discussed below in terms of their definitions and impact on calcu-lated likelihoods

The Y-chromosome normally exists in one copy in males and is absent in females It is inherited from father to son thus all men in a paternal lineage share an identical Y-chromosome Apart from the recombination region (~5) mutation is the only force that leads to new variation on the Y-

20 |

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 17: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

chromosome Due to this and the fact that the Y-chromosome has one-fourth of the relative population size compared with autosomal loci the Y-chromosomal variation has been found to be fairly population specific (Ham-mer et al 2003 Jobling et al 2004a) As a result regional population databases must be collected and studied

Both SNPs and STRs are used as markers on the Y-chromosome Y-SNPs can provide information about an individualrsquos haplogroup status (Karafet et al 2008) which can for instance be used for interpreting the paternal genetic geographical origin (Jobling amp Tyler-Smith 2003) For other forensic issues analysis of Y-STRs (resulting in a haplotype) is more useful (Jobling et al 1997) Nevertheless it is crucial to bear in mind that the Y-chromosome haplo-type is consistent for all males who share the same paternal lineage

DNA from the mitochondrion can also be used in forensic investigations It consists of a circular genome of ~16 600 nucleotides Each cell has 100 to 1000 copies of its mtDNA which makes it especially useful in forensic analyses where the amount of DNA can often be very low The mtDNA is inherited from mother to child (maternal) and can therefore be used to solve questions involving a potential maternal relationship From a population point of view mtDNA has many similarities with other haploid genomes (eg the Y-chromosome) Because of its haploid status mtDNA profiles are also relatively population specific which must be accounted for when conclusions are made (Holland amp Parsons 1999)

Figure 2 Illustration of the inheritance pattern of two X-chromosomal loci located at a distance θ from each other in a family consisting of a mother father and a female child X1a-c and X2a-c are alleles for the X-chromosomal markers 1 and 2 respectively The value in parenthesis is the segregation probability for the inheritance of the given haplotype from the parents

| 21

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 18: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

Population Genetics Population genetics is the study of hereditable variation and its change over time and space and includes the process of mutation selection migration and genetic drift By quantification of different DNA alleles and their occurrence within and between populations information about parameters such as popula-tion structure growth size and age can be retrieved (Jobling et al 2004a)

Substructure In addition to the estimation of allele frequencies it is also important to check for possible genetic substructures within a population and to study genetic variation among populations The most common way of studying these differ-ences is by means of FST-statistics (Wright 1951 see also Holsinger amp Weir 2009 for a review) FST has a direct relationship to the variance in allele fre-quencies withinamong populations Small FST-values correspond to small dif-ferences withinamong populations and vice versa Variants of FST exist which in addition also take relevant evolutionary distance between alleles into account (eg ΦST and RST) For forensic purposes it is highly important to study possible substructure in the population of interest If substructure exists it has to be accounted for when producing the strengths of the DNA profile evidence (Balding amp Nichols 1994)

Linkage and Linkage disequilibrium Linkage and linkage disequilibrium (LD) deal with the phenomenon character-ized by the dependence that can exist between different loci and between alleles at different loci

Linkage can be defined as the co-segregation of closely located markers within a family (Figure 2) During meiosis the maternal and paternal chromo-some homologs align and exchange segments by a phenomenon known as crossing over or recombination Consider for example two markers located on the same chromosome If recombination occurs between the two markers the resulting chromosome in the gametes now has a different appearance com-pared with its parental chromosomes The allele combination of the two mark-ers (ie haplotype) is thus changed compared with its parental constitution The distance between two loci can be measured and discussed as the recombination frequency θ and estimated based on data from family studies The recombina-tion frequency is correlated to the genetic distance between the loci (Ott 1999)

Linkage disequilibrium on the other hand concerns dependencies between alleles at different loci and can be defined as the non-random association of alleles in haplotypes LD can originate from the fact that the loci are closely located thus inherited together more often than randomly However it can also be due to population genetic events such as selection founder effects and ad-mixture (Ott 1999) LD can be studied by comparison between observed hap-

22 |

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 19: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

lotype frequencies and haplotype frequencies expected under linkage equilib-rium (LE)

If we have two loci and are interested in the population frequency for haplo-type a-b where a is the allele at locus 1 and b is the allele at locus 2 the fre-quency can be estimated from

Δ+sdot= )()()( bfafabf

Where is the frequency for the haplotype a-b and and

are the allele frequency for alleles a and b respectively If we have linkage equilibrium then Δ = 0 ie no association exists between a and b However if there is a dependency between the alleles in locus 1 and locus 2 then Δne0 and the loci are considered to be in LD

)(abf )(af)(bf

If haplotype frequencies are to be estimated for markers in LD they are best inferred directly from observed haplotype frequencies in the population rather than estimating Δ for each allele combination especially when dealing with multiallelic loci

Validation of a frequency database Prior to the introduction of new DNA markers into forensic casework studies should be performed on the relevant population in order to establish allele (or haplotype) frequencies and investigate potential substructure Furthermore certain tests must be conducted concerning the independent segregation of alleles Hardy-Weinberg equilibrium HWE (Hardy 1908) and LD tests deal with the issue of independence of alleles within a locus and between loci re-spectively If the population is not in HWE or in LE it has to be accounted for when calculating the statistics in casework When performing the HWE and LD tests Fisherrsquos exact test is the preferable method (Fisher 1951) However it is important to note that the exact test has very limited power making it difficult to draw any highly significant conclusions about the outcome of either test (Buckleton et al 2001)

Another feature to consider is the forensic efficiency of using the DNA markers in casework involving criminal cases and relationship testing Such estimates describe the theoretical value of using the specific markers for differ-ent forensic genetic situations and differ from case specific values The estima-tion of such parameters is most often based on the number of distinctive alleles found in the population and their corresponding frequencies

The description and mathematical formulation of a selection of useful pa-rameters are provided below

There are different definitions of gene diversity (GD) This parameter de-scribes the probability that two alleles drawn at random from the population will be different

| 23

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 20: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

The unbiased estimator is given by (Nei 1987)

)1(1

2summinusminus

=i

ipnnGD

where n is the number of gene copies sampled and pi is the frequency of the ith allele in the population

The match probability (PM) is defined as the probability of a match be-tween two unrelated individuals and is calculated as (Fisher 1951)

sum=i

iGPM 2

where Gi is the frequency of the genotype i at a given locus in the population Thus PM is the sum of all partial match probabilities for all genotypes PM can also be interpreted from allele frequencies given that the population is in Hardy Weinberg equilibrium (Jones 1972)

The power of discrimination (PD) is defined as the probability of discrimi-nating between two unrelated individuals Thus correlated to PM discussed above

PMPD minus= 1

Polymorphism Informative Content (PIC) can be interpreted as the prob-

ability that the maternal and paternal alleles of a child are deducible or the probability of being able to deduce which allele a parent has transmitted to the child (Botstein et al 1980 Guo amp Elston 1999) There are two instances when this cannot be deduced namely when one parent is homozygous or when both parents and the child have the same heterozygous genotype Thus

sum sum sum=

minus

= +=

minusminus=n

i

n

i

n

ijjii pppPIC

1

1

1 1

222 21

where pi and pj are allele frequencies

The probability of excluding paternity (Q) is calculated from (Ohno et al 1982)

sum sumsumminus

= +==

++minus+minus+minus=1

1 1

2

1

22 ))(1()1)(1(n

i

n

ijjijiji

n

iiiii ppppppppppQ

Q is inferred from two factors First the exclusion probability for a given motherchild genotype combination which is either (1-pi)2 or (1-pi-pj)2 and second the expected population frequency for the genotypes of the motherchild combination pi and pj are the frequencies for the paternal alleles Q is then interpreted from the sum of all motherchild genotype combinations

24 |

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 21: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

as described above An alternative figure for the power of exclusion (PE) exists and is defined as (Brenner amp Morris 1990)

)21( 22 HhhPE sdotsdotminussdot=

where h is the proportion of heterozygous individuals and H the proportion of homozygous individuals in the population sample

The formulas given so far are for autosomal markers Corresponding formu-las exist for X-chromosomal markers (Szibor et al 2003) such as the mean exclusion chance (MEC) for trios including a daughter (Desmarais et al 1998) This is equivalent to the probability of exclusion Q with the difference that the exclusion probability for a given motherchild genotype combination is either (1-pi) or (1-pi-pj) Thus the mean exclusion chance when mother and child are tested is

2242 )(1 sumsumsum minus+minus=

i ii ii iTrio pppMEC

where pi is the allele frequency for allele i pi can also represent haplotype fre-quency if such is considered The mean exclusion chance for duos involving a man and a daughter MECDuo (Desmarais et al 1998) is

sumsum +minus=

i ii iDuo ppMEC 3221

The Swedish population and its genetic appearance Immigration into Scandinavia did not start until around 12 000 years ago due to the ice that covered Northern Europe Since then immigration and population movements of various degrees descent and directions have occurred within the present borders of Sweden Many of the groups that immigrated originated from Western Europe and are suggested to represent a non-Indo-European population (Blankholm 2008 Zvelebil 2008) This in combination with re-corded demographic events over the last 1 000 years (Svanberg 2005) may be the cause of the genetic composition of the modern Swedish population

The Swedish population has been investigated regarding forensic autosomal STRs (Montelius et al 2008) and forensic autosomal SNPs (Montelius et al 2009) Both of these studies revealed high genetic diversities and information content for usage in relationship testing and criminal cases Strong similarities with other European populations were also recorded A sample of the Swedish population was recently compared with other European populations based on data from over 300 000 SNPs which showed a strong correlation between the geographic location and the genetic variability for the tested populations (Lao et al 2008)

| 25

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 22: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

Regarding Y-chromosome variation some studies have aimed at facilitating the setting-up of a Swedish reference database (Holmlund et al 2006) while others have explored the demographic history of the Swedish male population (Karlsson et al 2006 Lappalainen et al 2009) These later studies confirm earlier findings of high similarity with other western European populations (Roewer et al 2005) However some Y-chromosome differences albeit small do exist within Sweden especially in the northern part of the country (Karlsson et al 2006)

Y-STR and Y-SNP data from the Swedish population are included in YHRD the world-wide Y-chromosome haplotype database (Willuweit amp Roewer 2007)

Due to continuous immigration to Sweden from various populations knowledge about non-European populations is also crucial for a correct as-sessment of the weight of evidence (Tillmar et al 2009)

Forensic mathematicsstatistics In order to assess the evidential weight for a DNA analysis the numerical strength of the evidence must be calculated as well as presented to the court or client in an appropriate way

Framework for interpretation and presentation of evidential weight When presenting the probability or weight of the DNA findings a logical framework is crucial in order to make the presentation clear and understandable to those who have to make decisions based on the DNA results The design of such a framework has been debated and there is still no clear consensus within the forensic community

The main discussion covers two (or perhaps three) different frameworks in-cluding a frequentist and a Bayesian approach (or a logistical approach which could be extended to a full Bayesian approach) These have different properties as well as pros and cons and several detailed publications about their usage exist (for example see Buckleton et al 2003 chapter 2 for a review)

In brief the frequentist approach is built around the calculation of a prob-ability concerning one hypothesis For example which means the probability of the evidence E when hypothesis H is true In this case E is the DNA profile and H could be ldquothe probability that the DNA come from an individual not related to the suspectrdquo If this probability is computed to be low the hypothesis can be rejected making an alternative hypothesis probable The argument in favour of this approach is that it is intuitive and relatively easy to understand However it has been the subject of some criticism mainly due to

)|Pr( HE

26 |

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 23: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

the lack of logical rigour which makes the set up of the hypothesis and its in-terpretation extremely important

The main characteristic of a Bayesian or logical approach is the use of a like-lihood ratio (LR) connecting the prior odds to the resulting posterior odds ie Bayesrsquos theorem (see formula below) The advantage of this approach is that the LR can be connected to any other evidence such as fingerprint informa-tion from eyewitnesses etc

)|Pr()|Pr(

)|Pr()|Pr(

)|Pr()|Pr(

0

1

0

1

0

1

IHIH

IHEIHE

IEHIEH

sdot=

oddsprior ratio likelihood oddsposterior sdot=

H1 (or HP) is commonly known as the prosecutorrsquos hypothesis and H0 (or Hd) is the hypothesis for the defence E represents the DNA profiles and I is other

relevant background evidence The quota )|Pr()|Pr(

0

1

IHEIHE

is the LR and it is

within this formula that the strength of the DNA is quantified The calculation of the LR for paternity cases (ie Paternity Index PI) is discussed in the follow-ing section

Regarding the choice of framework for relationship testing the Paternity Testing Commission (PTC) of the International Society for Forensic Genetics (ISFG) recently published biostatistical recommendations for probability calcu-lation specific to genetic investigations in paternity cases (Gjertson et al 2007) They recommend the use of the LR (ie PI) principle for calculating the weight of evidence These recommendations cover the most basic issues but lack in-formation on how to deal with for example linked genetic markers

Paternity index calculation As an example let Hp and Hd represent two mutually exclusive hypotheses for and against paternity Hp The alleged father is the father of the child Hd A random man not related to the alleged father is the father of the child The paternity index (PI) is typically defined as

)|Pr()|Pr(

dAFMC

pAFMC

HGGGHGGG

PI =

which means the probability of seeing the childrsquos (GC) motherrsquos (GM) and al-leged fatherrsquos (GAF) DNA profiles when the AF is the father in comparison to seeing the same DNA profiles when the AF is not the father

| 27

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 24: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

We can use the third law of probability and simplify

)|Pr()|Pr()|Pr()|Pr(

)|Pr()|Pr(

dAFMdAFMC

PAFMPAFMC

dAFMC

pAFMC

HGGHGGGHGGHGGG

HGGGHGGG

PI ==

The probability of seeing the DNA profiles from the mother and the AF is

the same irrespective of the hypothesis Thus we can make a further simplifi-cation

)Pr()|Pr()|Pr( AFMdAFMPAFM GGHGGHGG ==

resulting in

)|Pr()|Pr(

dAFMC

PAFMC

HGGGHGGG

PI =

We now need to calculate two probabilities 1) The probability of the childrsquos

genotype given the genotypes of the mother and the AF and given that the AF is the father (numerator) and 2) the probability of the childrsquos genotype given the genotypes of the mother and the AF and given that the AF is not the fa-ther but that someone else is (denominator)

We start with the calculation of 1) and assume that we have data from a sin-gle locus This probability is based on Mendelian heritage If it is possible to determine the maternal (AM) and paternal (AP) alleles for the child (assuming that the mother is the true mother) the numerator can either be 1 05 or 025 depending on the homozygousheterozygous status of the mother and the AF If both the mother and the AF are homozygous the numerator is 1 (the mother and the AF cannot share any other alleles) If either the AF or the mother is heterozygous the probability is 05 since there is a 5050 chance that the child will inherit one of the alleles from a heterozygous parent Conse-quently if both the mother and the AF are heterozygous the probability will be 025 (05 times 05)

If AM and AP are unambiguous the denominator is either p

)|Pr( dAFMC HGGGAp or 05pAp depending on the homozygousheterozygous status of the

mother pAp is the population frequency of allele AP and represents the prob-ability of the child receiving the allele from a random man in the population If AM and AP are ambiguous the PI is calculated as the sum of all possible values for AM and AP

As a simple example let GM have the genotype [ab] GC have [bc] and GAF have [cd]

Then

41

21

21)|Pr( =sdot=PAFMC HGGG

28 |

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 25: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

and

cdAFMC pHGGG sdot=21)|Pr(

thus

cc

dAFMC

PAFMC

ppHGGGHGGG

PIsdot

=sdot

==2

1

21

41

)|Pr()|Pr(

In other words as the more unusual allele c is in the population the prob-

ability that the AF is the biological father of the child has higher evidential weight

Decision How does one interpret the PI-value Bayesrsquos theorem is relevant in order to obtain posterior odds from which a posterior probability can be computed For paternity issues the prior odds have traditionally been set to 1 leading to the following value for the posterior probability of paternity

)|Pr()|Pr(EHEH

PId

P=

hence

)|Pr(1)|Pr(EHEH

PIp

P

minus=

resulting in

1)|Pr(

+=PIPIEHP

Hummel presented suggestions for verbal predicates based on the posterior probability (Hummel et al 1981) It is however up to the forensic laboratory to set a limit or cut-off for inclusion based on the PI or the posterior probability (Hallenberg amp Morling 2002 Gjertson et al 2007) A too low cut-off will in-crease the risk of falsely including a non-father as a true father and vice versa

Mathematical model for automatic likelihood computation for relationship testing While the calculation of the PI for trios and single markers are fairly simple it rapidly becomes more complicated with the introduction of the possibility of

| 29

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 26: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Introduction

mutations (Dawid et al 2002) silent alleles (Gjertson et al 2007) population substructure (Ayres 2000) and when treating deficiency cases (Brenner 2006) In such situations the use of a model for automatic likelihood computations is helpful In 1971 Elston amp Stewart presented a model for the exact calculation of the likelihood of a given pedigree (Elston amp Stewart 1971) The likelihood can be described as

)|Pr()Pr()|()(

1

prod prod prodsumsum=i

mffounder mfo

ofounderiiGG

GGGGGXPenPedLn

The Elston-Stewart algorithm uses a recursive approach starting at the bottom of a pedigree by computing the probability for each childrsquos genotype condi-tional on the genotype of the parents The advantage using this approach is that if the summation for the individual at the bottom is computed first it can be attached as a factor in the calculation of the summation for his parents and thus this individual needs no further consideration This procedure represents a peeling algorithm The penetration (Pen) factor can be disregarded when treat-ing non-trait loci

The Elston-Stewart algorithm works well on large pedigrees but its compu-tational efforts increases with the number of markers included A need has emerged for a fast computational model for consideration of thousands of linked markers due to increased access to large datasets Lander and Green developed the Lander-Green algorithm in 1987 (Lander amp Green 1987) which permits simultaneous consideration of thousands of loci and has a linear in-crease in computational efforts related to the number of markers The Lander-Green algorithm has three main steps to consider 1) the collection of all possi-ble inheritance vectors in a pedigree for alleles transmitted from founder to offspring 2) iteration over all inheritance vectors and the calculation of the probability of the marker specific observed genotypes conditioning on the in-heritance vectors and finally 3) the joint probability of all marker inheritance vectors along the same chromosome (eg transmission probabilities) By the use of a hidden Markov model (HMM) for the final step an efficient computa-tional model can be obtained (see Kruglyak et al 1996 for a more detailed description)

Practical implementation of the Lander-Green algorithm has been shown to work well in terms of taking linkage properly into account for hundreds of thousands of markers although it assumes linkage equilibrium for the popula-tion frequency estimation (Abecasis et al 2002 Skare et al 2009)

30 |

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 27: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Aim of the thesis

The aim of this thesis was to study important population genetic parameters that influence the weight of evidence provided by a DNA-analysis as well as models for proper consideration of such parameters when calculating the weight of evidence

Specific aims Paper I To analyse the risk of making erroneous conclusions in complex relationship testing and propose methods for reducing the risk of such errors

Paper II To establish a Swedish mitochondrial DNA frequency database compare it in a worldwide context and study potential substructure within Sweden

Paper III To investigate eight X-chromosomal STR markers in a Swedish population sample concerning allele and haplotype frequencies and forensic efficiency parameters Furthermore to study recombination rates in Swedish and Somali families

Paper IV To propose a model for the computation of the likelihood ratio in relationship testing using markers on the X-chromosome that are both linked and in linkage disequilibrium

| 31

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 28: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

32 |

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 29: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Paper I - DNA-testing for immigration cases the risk of erroneous conclusions The standard paternity case includes a child the mother of the child and an alleged father (AF) An assessment of the weight of the DNA result can be performed and a decision whether or not the AF can be included or excluded as the true father (TF) of the child can be made This decision can however be incorrect due to an exclusion or as an inclusion error (meaning falsely exclud-ing the AF as TF or falsely including the AF as TF respectively) In this paper we studied the risk of erroneous decisions in relationship testing in immigration casework These cases can involve uncertainties concerning appropriate allele frequencies different degrees of consanguinity a close relationship between the AF and TF and complex pedigrees

Materials and methods A simulation approach was used to study the impact of the different pa-

rameters on the computed likelihood ratio and error rates Two mutually exclu-sive hypotheses are normally used in paternity testing We introduced a five hypotheses model in order to account for the alternative of a close relationship between the TF and the AF (Figure 3)

Family data were generated and in the standard case the individualsrsquo DNA-profiles were based on 15 autosomal STR markers with published allele fre-quencies

When calculating the weight of evidence expressed as posterior probabili-ties we used a Bayesian framework with the standard two hypotheses and the five hypotheses model for comparison The error rates were studied by com-paring the outcome of the test with the simulated relationship using a decision rule for inclusion and exclusion

| 33

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 30: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Figure 3 The different alternative hypotheses for simulation and calculation of the true relation-ship between the alleged father (AF) the child (C) and the mother (M)

Results and discussion Simulation of a standard paternity case yielded an unweighted total error rate of approximately 08 (for a 9999 cut off) This might appear fairly high but is due to the fact that we used an equal prior probability for the possibility of the alternative hypotheses ie the same number of cases was simulated for hy-pothesis H1a as for H1b H1c and H1d respectively We demonstrated that when more information was added to the case the error decreased especially exclu-sion error (Table 1)

The use of an inappropriate allele frequency database had only a minor in-fluence on the total error rate but was shown to have a considerable impact on individual LR

When dealing with cases where there is an expected risk of having a relative of the TF as the AF it is essential to include a computational model for treating inconsistencies When there is only a limited number of inconsistencies be-tween the AF and the child the question arises whether or not these are due to mutations or are true exclusions The recommended way of handling such cases is to include all loci in the calculation of the total LR (Gjertson et al 2007) although some labs still use a limit of a maximum number of inconsis-tencies for inclusionexclusion (Hallenberg amp Morling 2009) However we demonstrated that it is better to use a probabilistic model even if the interpre-tation is not totally correct than not to employ one at all (Table 1)

Furthermore we proposed and tested a five hypotheses model in order to reduce the risk of falsely including a relative of the TF as the biological father The simulations revealed that utilisation of such a model significantly decreased the error rates although the magnitude of the decrease was minor

34 |

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 31: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

The use of DNA analysis to clarify relationships for the purpose of family reunification is increasing and the evaluation of the statistical methods used is important In this paper we demonstrated that improvements are still necessary in order to reduce the risk of erroneous conclusions in immigration casework

Table 1 Error rates

Change in the error rate in comparison with the

standard case Total error (inclusion error

exclusion error) Consanguinity Mother and father simulated as first cousin 3 (10 -1) Additional information 20 markers DNA profiles -68 (-29 -89) 25 markers DNA profiles -83 (-56 -98) 2 children -88 (-73 -96) Mutation model Limit of 1 incon instead of mutation model for LR calc 16 (217 -95) Limit of 2 incon instead of mutation model for LR calc 320 (1079 -100) Inappropriate allele frequency Rwanda allele freq for data generation Swedish allele freq for LR calc 19 (190 -76) Somali allele freq for data generation Swedish allele freq for LR calc 2(106 -55) Iran allele freq for data generation Swedish allele freq for LR calc -13 (25 -34) Prior information Five hypotheses model for LR calc -24 (-8 -31)

A standard case was considered with data from 15 markers DNA profiles a mutation model for handling inconsistencies and an unweighted average for inclusion error for H1a-d Posterior probabilities were calculated based on the two hypotheses model (H0 AF is the father of the child H1 AF is unrelated to the child)

| 35

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 32: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Paper II - Homogeneity in mitochondrial DNA control region sequences in Swedish subpopulations In forensics mitochondrial DNA is mainly used in casework where a limited amount of nuclear DNA is present or when a maternal relationship is ques-tioned In the case of haploid DNA markers it is extremely relevant to set up and study regional frequency databases due to an increased risk of local fre-quency variations (Richards et al 2000) In this study we analysed mtDNA sequence variation in a Swedish population sample in order to facilitate forensic mtDNA testing in Sweden

Materials and methods Blood samples from 296 Swedish individuals from seven geographically differ-ent regions were typed together with 39 samples from a Swedish Saami popula-tion (ie Jokkmokk Saami) for the complete mtDNA control region (Figure 4) This hypervariable segment (eg HVS-I HVS-II and HVS-III) spans over 1100 nucleotides

Haplotype- and haplogroup frequencies were calculated and interpreted from the DNA sequence variation The statistical evaluation involved enumera-tion of forensic efficiency parameters as well as comparison of the genetic variation found in the Swedish regions and between the Swedish other Euro-pean and non-European populations

Results and discussion Two hundred and forty seven different haplotypes were found among the typed Swedes This represents a haplotype diversity of 0996 and a random match probability of 05 which are in the same magnitude as for other Euro-pean populations (Budowle et al 1999b) Comparing mtDNA haplogroup frequencies with corresponding frequencies for 20 world-wide populations grouped the Swedes with other western European populations This was fur-ther confirmed when calculating pairwise ΦST-values for a limited number of geographically close populations (Figure 4)

The mtDNA sequences were further analysed in order to study potential substructure within Sweden as indicated by an earlier study of the Swedish Y-chromosomal variation (Karlsson et al 2006) MtDNA haplotype frequencies from the eight different Swedish regions were compared and only the Saami population differed significantly from the rest The difference found for Y-chromosomal data between the northern region Vaumlsterbotten and the rest of Sweden was not observed in the mtDNA data This can most probably be ex-plained by demographic events However the impact of the relatively small sample sizes should not be ignored

36 |

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 33: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Figure 4 Descriptive statistics for the Swedish mtDNA haplotype database (Saami excluded) The values in parentheses are for the Saami population GD is the gene diversity (or haplotype diver-sity) PM is the match probability and FST the frequency variation among the seven Swedish subpopulations The FST distance for the Saami population represents the genetic distance be-tween the Saami and the seven Swedish regions combined The ΦST represents the genetic vari-ability between the seven Swedish regions combined and the German Finnish and Norwegian populations respectively The Swedish regions studied were as follows Vaumlsterbotten (1) Vaumlrm-land (2) Uppsala (3) Skaraborg (4) OumlstergoumltlandJoumlnkoumlping (5) Gotland (6) BlekingeKristianstad (7) and Jokkmokk Saami (8)

When estimating the population frequency for a given mtDNA haplotype it is crucial to have a large representative reference database The high similarity between the Swedish and other western European populations allows the inclu-sion of Swedish mtDNA data in initiatives like the EMPOP (The European DNA profiling group (EDNAP) mtDNA population database) (Parson amp Duumlr 2007) thus providing a more accurate estimate of the rarity of a mtDNA se-quence

| 37

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 34: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Paper III - Analysis of linkage and linkage disequilibrium for eight X-STR markers X-chromosomal markers are useful for deficiency relationship testing (Szibor et al 2003) The X-chromosome occurs in one copy in males and in two in fe-males and the combined use of several X-chromosomal DNA markers requires consideration of linkage and linkage disequilibrium Thus prior application of X-STRs in casework testing it is important to study population frequencies substructure and efficiency parameters but also to explore marker specific re-combination rates and allelic association among the loci In this study we fo-cused on eight X-STRs located in four linkage groups and their usability in relationship testing

Materials and methods 718 males and 106 females from a Swedish population were studied and ana-lysed for the eight X-chromosome STR markers included in the Argus X-8 kit (Biotype) (Figure 5) From these data were retrieved for establishing haplotype frequencies for performing the LD test and for estimating forensic efficiency parameters Family data from 16 Swedish families (3-7 children) and 16 Somali families (2-9 children) resulting in 84 to 116 informative meioses were consid-ered in order to estimate recombination frequencies Furthermore a model for estimating such recombination rates was presented

Results and discussion Diversity measurements and efficiency parameters revealed that the ldquofirstrdquo linkage group (DXS10135-DXS8378) was the most informative although only minor differences were seen among the linkage groups The linkage disequilib-rium test resulted in significant p-values for the pair of loci within each of the four linkage groups Thus for the Swedish population the loci should be treated as haplotypes rather than single markers By means of simulations we demonstrated that when LD was disregarded as opposed to taken into account the average difference in calculated LR was small although in some individual cases it was considerable

Recombination frequencies for the loci were established based on the family data (Figure 5) These indicated that the chance of recognising a recombination within each linkage group was small (lt 1) and that there is also a tendency for linkage between groups three (HPRTB-DXS10101) and four (DXS10134-DXS7423)

38 |

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 35: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Figure 5 Data for the eight X-chromosomal markers included in the Argus-X8 The values below the line are the location given as Mb (NCBI 36) The values above the loci represent the esti-mated recombination frequencies for the combined Swedish and Somali data set and the separate Swedish dataSomali data

The eight X-STRs investigated in this work have previously been widely studied by other groups mostly with regard to population allele frequencies For relationship testing however it is also important to study genetically rele-vant properties such as linkage and linkage disequilibrium In this paper our results indicated that such features cannot be ignored when producing the evi-dential weight of the X-chromosomal profiles in relationship testing

| 39

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 36: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

Paper IV - Using X-chromosomal markers in relationship testing How to calculate likelihood ratios taking linkage and linkage disequilibrium into account The findings in Paper III obliged us to study and develop a mathematical model for how to consider both linkage and linkage disequilibrium when calcu-lating the likelihood ratio in relationship testing for X-chromosome data Tradi-tionally the Elston-Stewart model (Elston amp Stewart 1971 Abecasis et al 2002) could be used for computing the LR involved in questions about a relationship However this model is not efficient enough for dealing with data from multiple linked loci On the other hand the Lander-Green algorithm (Lander amp Green 1987) works perfectly well for thousands of linked markers but assumes that the loci are in linkage equilibrium Efforts have however been made to treat groups of loci in LD (Abecasis amp Wigginton 2005) al-though we found that this approach was not fully satisfactory for our purpose Therefore we here present a model for the complete consideration of linkage and linkage disequilibrium and study the impact of taking and not taking link-age and LD into account by means of a simulation approach on typical pedi-grees and X-chromosomal data for the markers studied in Paper III

Materials and methods A computational model was presented based on the Lander-Green algorithm but extended by expanding the inheritance vectors to consider all of a haplo-typersquos loci

Six different cases representing pedigrees for which X-chromosomal analy-sis would be valuable were studied in order to test the model Three of these involved cases where DNA profiles were available for both the children and the founders (eg questions regarding paternity for a trio paternity for half-sisters and paternity for full-sisters) and three additional cases where DNA profiles were only available for the children (paternity for half-sisters paternity for full-siblings and maternity for brothers) Simulations were performed for each of the six cases that considered the tested relationship to be true or not true From these LR distributions were studied together with comparisons of the calcu-lated LR using our proposed model with LR calculated by means of simpler models with no or only partial consideration of linkage and LD Genotype (or haplotype) data were simulated from Swedish population frequencies for the eight STR markers studied in Paper III

Results and discussion The model for the likelihood calculation was adapted to the six different cases The simulations showed that the median LR for the three cases where DNA

40 |

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 37: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Investigations

profiles were available for the founders was high (~106) in comparison with the considerably lower median LR for the cases where genotype data for the foun-ders were not available (~102-103) Furthermore various degrees of positive LRs were obtained when the questioned relationship was simulated not to be true although only in the cases where founder profiles were not available

We then compared the LRs obtained using our proposed model with LRs from two simpler models The difference was on average small although somewhat larger for the model in which linkage and LD were not taken into account (Table 2) In some of the tested cases the estimation of the rarity of a given haplotype had a strong impact on the calculated likelihood ratio This was especially true when estimating the haplotype frequency for haplotypes earlier not seen

In summary we demonstrated that in order to reduce the risk of incorrect decisions linkage and linkage disequilibrium should be properly accounted for when calculating the weight of evidence and we proposed an efficient model to accomplish this

Table 2 Statistics for the simulation of a maternity case involving two brothers

LR

Log10

Median [95 cred] (minmax)

Difference

LR (m_2)LR(m_1)

Median [95 cred] (minmax)

Difference

LR (m_3)LR(m_1)

Median [95 cred] (minmax)

Rel 20 [-1 -62] (-1100) 12 [04-73] (0001418) 04 [0036-93] (00001438)

NoRel -1 [-1-05] (-140) 10 [09-13] (000263) 02 [00036-91] (0026433) Rel means simulation where the brothers have the same mother and noRel that the brothers were simulated to have dif-

ferent mothers 10 000 simulations were performed for each situation M_1 indicates the model in which both linkage and LD were considered m_2 the model in which linkage but not LD was considered and m_3 the model where linkage and LD were not taken into account

| 41

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 38: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

42 |

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 39: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Concluding remarks

Several parameters influence the assessment of the weight of evidence in a DNA investigation and each of them can have a considerable impact on the resulting figure This is a common feature of the four papers included in this thesis the aim of which was to study relevant population genetic properties and models for considering them when calculating likelihood in relationship testing

bull In Paper I we showed how the risk of erroneous decisions in rela-

tionship testing in immigration casework was affected by parameters such as the number of markers tested utilisation of relevant popula-tion allele frequencies use of a probabilistic model for the treatment of single genetic inconsistencies and consideration of alternative close relationships between the alleged father and the true father In addition we proposed methods for reducing the risk of erroneous decisions

bull In Paper II we set up an mtDNA haplotype frequency database for

the Swedish population We demonstrated that the mtDNA varia-tion in the Swedish population is high and that the homogeneity among different subregions within Sweden supports a combined Swedish population frequency database Furthermore the resem-blance of the mtDNA variation found in Sweden compared with other European populations makes it possible to enlarge the rele-vant reference population thus increasing the reliability of the esti-mation of the rarity of a given mtDNA haplotype

bull In Paper III we studied eight X-chromosomal markers in terms of

their informativeness and usefulness in relationship testing We found that the markers located in each of the four linkage groups were in linkage disequilibrium and that the linkage within and be-tween the linkage groups for the Swedish population highlighted the need to consider such parameters when producing the evidential weight for X-chromosomal marker investigations Thus when con-sidering the Swedish population the commonly used product rule for employing the eight X-STRs is not valid

| 43

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 40: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Concluding remarks

bull In Paper IV we presented a model for the calculation of likelihood ratios taking both linkage and linkage disequilibrium into account and applied it on simulated cases based on DNA profiles with X-chromosomal data We revealed that X-chromosomal analysis can be useful for choosing between alternative hypotheses in relation-ship testing Furthermore we showed that our proposed model for proper consideration of both linkage and linkage disequilibrium is efficient and that disregarding LD and linkage can have a consider-able impact on the computed likelihood ratio

44 |

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 41: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Future perspectives

It is always difficult to predict the future as things never turn out as we imag-ine Nevertheless I will attempt to make some intelligent guesses concerning the future of forensic genetics If we look back at the developments in this field we can see enormous progress over the past 25 years much of which is thanks to the discovery of polymorphic genetic markers thermo stable poly-merases as well as the invention of PCR and capillary electrophoresis making DNA analysis faster cheaper and more accurate We should also not forget the valuable information provided by the HUGO-project and the assistant from the developments of fast computer power

If we go back to the time just before the above-mentioned findings and adopt a clientrsquos point of view we can see that the chances of linking a suspect to the crime scene or solving relationship issues quickly and with a high degree of accuracy were rather small Thus the demand from clients to resolve these issues was a major factor behind the high degree of development Once the discoveries were made an incredible amount of dollars and scientific hours were invested in forensic genetics in order to meet the demands

So is it likely that we will see similar trends over the next 25 years I do not really think so One reason is that today many cases can be solved with existing methods and major investments have already been made in forensic laborato-ries regarding instrumentation and the large national DNA-profile databases which consist of a given number of predefined DNA markers (eg CODIS (wwwfbigovhqlabcodisclickmaphtm)) In the case of the national data-bases which like CODIS consists of over 8000000 profiles it would be very difficult to replace the existing markers with a different set up of markers

Generally I believe that future development will become more diversified which means less money and time spent on each topic With regard to the use of DNA in criminal cases perhaps the most important areas concern methods for obtaining complete DNA profiles from difficult biological samples (Bu-dowle et al 2009) resolving mixtures (Homer et al 2008) and predicting an individualrsquos physical appearance such as eye and skin colour (Kayser amp Schnei-der 2009) To this should be added the constant upgrading of analysis instru-ments which will make DNA-profiling faster and of an even higher quality

If we turn to the use of DNA in relationship testing which is the main topic of this thesis methods and markers have traditionally been adapted from de-velopments in the much larger field of DNA in criminal cases In terms of fu-

| 45

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 42: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Future perspectives

ture developments I feel that there will be a greater focus on statistical meth-ods to handle data from an increased amount of DNA markers However I think that these developments will in general be less compared to the last 25 years Take for example paternity testing which constitutes the vast majority of all relationship testing cases Today such cases are mostly solved with a suf-ficiently high probability and certainty thus the community will not continue to invest the same level of time and money in research However relationship questions still remain that cannot be resolved in a satisfactory manner with existing methods for example separating full-siblings from half-siblings and half-siblings from unrelated individuals (Nothnagel et al 2010) Some attempts have been made to solve similar cases as well as more distant relationships by means of array SNP data (Skare et al 2009 Ge et al 2010) The similarity between these studies is the simultaneous use of a large number of markers What becomes important in such cases are the statistical methods required for accurate estimation of the evidential value of the DNA profiles taking proper-ties such as linkage and linkage disequilibrium into account

When it comes to the continuum efforts regarding the linked X-STRs stud-ied in this thesis there are mainly two things that remain to be investigated One is to transform the model presented in Paper IV into user-friendly soft-ware which can be applied to various sets of DNA marker systems and differ-ent sets of pedigrees The other issue that requires extensive study is the estima-tion of haplotype frequencies As more markers are included in a haplotype block much larger population databases are required in order to obtain an es-timate of its frequency with a low level of uncertainty

Furthermore we will continue to explore the genetic diversity of the Swed-ish population with a special focus on northern Sweden Historical data have shown that there are regions that have been rather isolated until early 1900s and we are interested to see if this is traceable in the genetics thus could have an impact when producing the weight of evidence

Another area that needs further improvement concerns the use of DNA for disaster victim identification (Prinz et al 2007) This involves post-mortem DNA analysis data management and methods for statistical interpretation

Finally one should not forget the incredible progress made with the so-called next generation sequencing methods (Metzker 2010) It is now possible to sequence a complete human genome for under $100000 This would allow access to a huge amount of information for predictions However although the methods are technically suitable for producing such large data sets the down-stream efforts for data management and statistical interference still require a great deal of work (Metzker 2010)

46 |

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 43: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Acknowledgements

Tack till All studies performed in this thesis were conducted at National Board of Fo-rensic Medicine Department of Forensic Genetics and Forensic toxicology Scientific work is never a one-man show and I am grateful for all help support and encouragement I have received through the years Especially I would like to thank My tutors Professor Bertil Lindblom Associate professor Gunilla Holmlund and Associate professor Petter Mostad for sharing your deep knowledge within the field Thanks for your guidance encouragements and always having your doors open for discussions Also Bertil especially for your confidence and wide expertise within forensic genetics Gunilla who introduced me to the field of forensic genetics for your positive spirit Petter especially for all your support on mathematical and statistical issues The staff in the ldquoRG-teamrdquo Helena Kerstin Birgitta Ann-Britt Kersti Johnny Anna-Lena Ulla Irene Margareta Agnetha Pernilla and Louise for all support big and small and for making Artillerigatan 12 a fun place to work Johan Ahlner the head of the Department of Forensic Genetics and Forensic Toxicology for being an inspiring boss My fellows in the ldquoJippo-gruppenrdquo for fun moments and all other people at RMV-Linkoumlping who has been help-ful in different ways Thore Egeland Anders Goumltherstroumlm Thomas Wallerstroumlm Helena Jankovic Malmstroumlm Michael Coble and Jukka Palo for valuable scientific collaborations and discussions which I hope will continue As a researcher you need to discuss ideas big or small and as a PhD-student it is always nice to have a ldquobollplankrdquo for practical issues I thank Jenny Ve-lander Daniel Kling Gunnel Nilsson and Anna-Lena Zachrisson for in-spiring discussions and practical tips

| 47

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 44: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

Acknowledgements

As a scientist it is valuable to be able to think about other stuffs beside the work I would like to thank Magnus and Erik for always being such good friends Leonard Claes Ragnar Sara and Evelin for the years in Uppsala My father Jan-Aringke mother Inga-Lill sister Frida and brother Martin for your endless support and for believing in me during my life My relatives Ingvar amp Eivor and grandfather Olle for always being around when help is needed My family in-law Tillmar It is always nice to have such a welcoming place to go to Finally my beloved wife Ida and child Signe Ida for your constant support love and caring Thanks for your patient when I work at non-office hours Signe our wonderful daughter for being constantly (almost) happy and for your help getting me up in the mornings during the writing of this thesis Thanks to all others who I not mentioned but has been helpful in different ways This work was supported by grants from National Board of Forensic Medicine and Center of Forensic Science Linkoumlping University

48 |

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 45: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

Abecasis GR Cherny SS Cookson WO and Cardon LR (2002) Merlin--rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 3097-101

Abecasis GR and Wigginton JE (2005) Handling marker-marker linkage dis-equilibrium pedigree analysis with clustered markers Am J Hum Genet 77754-767

Alacs EA Georges A Fitzsimmons NN and Robertson J (2010) DNA detec-tive a review of molecular approaches to wildlife forensics Forensic Sci Med Pathol in press

Ayres KL (2000) Relatedness testing in subdivided populations Forensic Sci Int 114107-115 Balding DJ and Nichols RA (1994) DNA profile match probability calculation how to allow for population stratification relatedness database selection and single bands Forensic Sci Int 64125-140 Becker D Rodig H Augustin C Edelmann J Gotz F Hering S Szibor R and Brabetz W (2008) Population genetic evaluation of eight X-chromosomal short tandem repeat loci using Mentype Argus X-8 PCR amplification kit Fo-rensic Sci Int Genet 269-74

Blankholm HP (2008) Southern Scandinavia In Mesolithic Europe (eds) Bai-ley G and Spikins P Cambridge University Press New York

Borsting C Rockenbauer E and Morling N (2009) Validation of a single nu-cleotide polymorphism (SNP) typing assay with 49 SNPs for forensic genetic testing in a laboratory accredited according to the ISO 17025 standard Foren-sic Sci Int Genet 434-42

Botstein D White RL Skolnick M and Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms Am J Hum Genet 32314-331

| 49

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 46: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

Brenner CH Morris JW (1990) Paternity Index Calculations in Single Locus Hypervariable DNA Probes Validation and Other Studies Proceedings for The International Symposium on Human Identification 1989 Promega Corporation 21-53 Brenner CH (2006) Some mathematical problems in the DNA identification of victims in the 2004 tsunami and similar mass fatalities Forensic Sci Int 157172-180

Buckleton J Triggs CM and Curran JM (2001) Detection of deviations from genetic equilibrium--a commentary on Budowle B Moretti TR Baumstark AL Defenbaugh DA Keys KM Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahami-ans Jamaicans and Trinidadians J Forensic Sci 1999441277-86 J Forensic Sci 46198-202

Buckleton JS Triggs CM and Walsh SJ (2005) Forensic DNA evidence inter-pretation CRC Press Boca Raton

Budowle B Garofano P Hellman A Ketchum M Kanthaswamy S Parson W van Haeringen W Fain S and Broad T (2005) Recommendations for ani-mal DNA forensic and identity testing Int J Legal Med 119295-302

Budowle B Moretti TR Baumstark AL Defenbaugh DA and Keys KM (1999a) Population data on the thirteen CODIS core short tandem repeat loci in African Americans US Caucasians Hispanics Bahamians Jamaicans and Trinidadians J Forensic Sci 441277-1286

Budowle B and van Daal A (2009) Extracting evidence from forensic DNA analyses future molecular biology directions Biotechniques 46339-340 342-350

Budowle B Wilson MR DiZinno JA Stauffer C Fasano MA Holland MM and Monson KL (1999b) Mitochondrial DNA regions HVI and HVII popula-tion data Forensic Sci Int 10323-35

Butler JM (2006) Genetics and genomics of core short tandem repeat loci used in human identity testing J Forensic Sci 51253-265

Coble MD Just RS OCallaghan JE Letmanyi IH Peterson CT Irwin JA and Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians Int J Legal Med 118137-146

Dawid AP Mortera J Pascali VL and van Boxel DW (2002) Probabilistic expert systems for forensic inference from genetic markers Scand J Statist 29577-595

50 |

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 47: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

Desmarais D Zhong Y Chakraborty R Perreault C and Busque L (1998) Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA) J Forensic Sci 431046-1049

Diegoli TM and Coble MD (2010) Development and characterization of two mini-X chromosomal short tandem repeat multiplexes Forensic Sci Int Genet in press

Egeland T and Sheehan N (2008) On identification problems requiring linked autosomal markers Forensic Sci Int Genet 2219-225

Ellegren H (2004) Microsatellites simple sequences with complex evolution Nat Rev Genet 5435-445

Elston RC and Stewart J (1971) A general model for the genetic analysis of pedigree data Hum Hered 21523-542

Fisher RA (1951) Standard calculations for evaluating a blood-group system Heredity 595-102

Ge J Budowle B Planz JV and Chakraborty R (2010) Haplotype block a new type of forensic DNA markers Int J Legal Med in press

Gill P and Werrett DJ (1987) Exclusion of a man charged with murder by DNA fingerprinting Forensic Sci Int 35145-148

Gjertson DW Brenner CH Baur MP Carracedo A Guidet F Luque JA Lessig R Mayr WR Pascali VL Prinz M Schneider PM and Morling N (2007) ISFG Recommendations on biostatistics in paternity testing Forensic Sci Int Genet 1223-231

Gomes I Prinz M Pereira R Meyers C Mikulasovich RS Amorim A Car-racedo A and Gusmao L (2007) Genetic analysis of three US population groups using an X-chromosomal STR decaplex Int J Legal Med 121198-203

Guo X and Elston RC (1999) Linkage information content of polymorphic genetic markers Hum Hered 49112-118

Hallenberg C and Morling N (2002) A report of the 2000 and 2001 paternity testing workshops of the English speaking working group of the international society for forensic genetics Forensic Sci Int 12943-50

Hallenberg C and Morling N (2009) Results of the 2009 Paternity Testing Workshop of the English Speaking Working Group of the International Society for Forensic Genetics forensic Sci Int Genet Supp 2680-681

| 51

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 48: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

Hammer MF Blackmer F Garrigan D Nachman MW and Wilder JA (2003) Human population structure and its effects on sampling Y chromosome se-quence variation Genetics 1641495-1509

Hardy GH (1908) Mendelian proportion in a mixed population Science 2849-50

Holland MM and Parsons TJ (1999) Mitochondrial DNA Sequence Analy-sismdashValidation and Use for Forensic Casework Forensic Science Review 1122-50

Holmlund G Nilsson H Karlsson A and Lindblom B (2006) Y-chromosome STR haplotypes in Sweden Forensic Sci Int 16066-79

Holsinger KE and Weir BS (2009) Genetics in geographically structured popu-lations defining estimating and interpreting F(ST) Nat Rev Genet 10639-650

Homer N Szelinger S Redman M Duggan D Tembe W Muehling J Pear-son JV Stephan DA Nelson SF and Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays PLoS Genet 4e1000167

Hummel K Gerchow J and Essen-Megraveoller E (1981) Biomathematical eviden-ce of paternity festschrift for Erik Essen-Megraveoller Springer-Verlag Berlin New York

Hundertmark T Hering S Edelmann J Augustin C Plate I and Szibor R (2008) The STR cluster DXS10148-DXS8378-DXS10135 provides a powerful tool for X-chromosomal haplotyping at Xp22 Int J Legal Med 122489-492

Jeffreys AJ Wilson V and Thein SL (1985) Hypervariable minisatellite re-gions in human DNA Nature 31467-73

Jobling M Hurles M Tyler-Smith C and NetLibrary Inc (2004a) Human evolutionary genetics origins peoples amp disease Garland Science New York

Jobling MA and Gill P (2004b) Encoded evidence DNA in forensic analysis Nat Rev Genet 5739-751

Jobling MA Pandya A and Tyler-Smith C (1997) The Y chromosome in fo-rensic analysis and paternity testing Int J Legal Med 110118-124

Jobling MA and Tyler-Smith C (2003) The human Y chromosome an evolu-tionary marker comes of age Nat Rev Genet 4598-612

Jones BJ and Brewer JK (1972) An analysis of the power of statistical tests reported in the Research Quarterly Res Q 4323-30

52 |

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 49: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

Karafet TM Mendez FL Meilerman MB Underhill PA Zegura SL and Hammer MF (2008) New binary polymorphisms reshape and increase resolu-tion of the human Y chromosomal haplogroup tree Genome Res 18830-838

Karch SB (2007) Changing times DNA resequencing and the nearly normal autopsy J Forensic Leg Med 14389-397

Karlsson AO Wallerstrom T Gotherstrom A and Holmlund G (2006) Y-chromosome diversity in Sweden - a long-time perspective Eur J Hum Genet 14963-970

Kayser M and Schneider PM (2009) DNA-based prediction of human exter-nally visible characteristics in forensics motivations scientific challenges and ethical considerations Forensic Sci Int Genet 3154-161

Krawczak M (2007) Kinship testing with X-chromosomal markers mathemati-cal and statistical issues Forensic Sci Int Genet 1111-114 Kruglyak L Daly MJ Reeve-Daly MP and Lander ES (1996) Parametric and nonparametric linkage analysis a unified multipoint approach Am J Hum Ge-net 581347-1363 Lander ES and Green P (1987) Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci U S A 842363-2367

Lao O Lu TT Nothnagel M Junge O Freitag-Wolf S Caliebe A Balas-cakova M Bertranpetit J Bindoff LA Comas D Holmlund G Kouvatsi A Macek M Mollet I Parson W Palo J Ploski R Sajantila A Tagliabracci A Gether U Werge T Rivadeneira F Hofman A Uitterlinden AG Gieger C Wichmann HE Ruther A Schreiber S Becker C Nurnberg P Nelson MR Krawczak M and Kayser M (2008) Correlation between genetic and geo-graphic structure in Europe Curr Biol 181241-1248

Lappalainen T Hannelius U Salmela E von Dobeln U Lindgren CM Hu-oponen K Savontaus ML Kere J and Lahermo P (2009) Population struc-ture in contemporary Sweden--a Y-chromosomal and mitochondrial DNA analysis Ann Hum Genet 7361-73

Lee HC and Labriola J (2001) Famous Crimes Revisited Strong Books South-ington

Metzker ML (2010) Sequencing technologies - the next generation Nat Rev Genet 1131-46

Montelius K Karlsson AO and Holmlund G (2008) STR data for the AmpFlSTR Identifiler loci from Swedish population in comparison to Euro-pean as well as with non-European population Forensic Sci Int Genet 2e49-52

| 53

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 50: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

Montelius K Tillmar AO Kumlin J and Lindblom B (2009) Swedish popula-tion data on the SNPforID consortium autosomal SNP-multiplex Forensic Sci Int Genet Supp 2344-346

Nei M (1987) Molecular evolutionary genetics Columbia University Press New York

Nothnagel M Schmidtke J and Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci Int J Legal Med in press

Ohno Y Sebetan IM and Akaishi S (1982) A simple method for calculating the probability of excluding paternity with any number of codominant alleles Forensic Sci Int 1993-98

Ott J (1999) Analysis of human genetic linkage 3rd edition Johns Hopkins University Press Baltimore

Parson W and Dur A (2007) EMPOP--a forensic mtDNA database Forensic Sci Int Genet 188-92

Phillips C Fondevila M Garcia-Magarinos M Rodriguez A Salas A Car-racedo A and Lareu MV (2008) Resolving relationship tests that show am-biguous STR results using autosomal SNPs as supplementary markers Forensic Sci Int Genet 2198-204

Pinto N Gusmatildeo L and Amorim A (2010) X-chromosome markers in kin-ship testing A generalisation of the IBD approach identifying situations where their contribution is crucial Forensic Sci Int Genet in press

Prinz M Carracedo A Mayr WR Morling N Parsons TJ Sajantila A Scheithauer R Schmitter H and Schneider PM (2007) DNA Commission of the International Society for Forensic Genetics (ISFG) recommendations re-garding the role of forensic genetics for disaster victim identification (DVI) Forensic Sci Int Genet 13-12

Pulker H Lareu MV Phillips C and Carracedo A (2007) Finding genes that underlie physical traits of forensic interest using genetic tools Forensic Sci Int Genet 1100-104

Richards M Macaulay V Hickey E Vega E Sykes B Guida V Rengo C Sellitto D Cruciani F Kivisild T Villems R Thomas M Rychkov S Rych-kov O Rychkov Y Golge M Dimitrov D Hill E Bradley D Romano V Cali F Vona G Demaine A Papiha S Triantaphyllidis C Stefanescu G Hatina J Belledi M Di Rienzo A Novelletto A Oppenheim A Norby S Al-Zaheri N Santachiara-Benerecetti S Scozari R Torroni A and Bandelt

54 |

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55

Page 51: Populations and Statistics in Forensic Geneticsliu.diva-portal.org/smash/get/diva2:309703/FULLTEXT01.pdf · Populations and Statistics in Forensic Genetics . Andreas Tillmar . Department

References

HJ (2000) Tracing European founder lineages in the Near Eastern mtDNA pool Am J Hum Genet 671251-1276

Roewer L Croucher PJ Willuweit S Lu TT Kayser M Lessig R de Knijff P Jobling MA Tyler-Smith C and Krawczak M (2005) Signature of recent historical events in the European Y-chromosomal STR haplotype distribution Hum Genet 116279-291

Skare O Sheehan N and Egeland T (2009) Identification of distant family relationships Bioinformatics 252376-2382

Sobrino B and Carracedo A (2005) SNP typing in forensic genetics a review Methods Mol Biol 297107-126

Svanberg I and Tydeacuten M (2005) Tusen aringr av invandring en svensk kulturhi-storia Dialogos Stockholm

Szibor R (2007) X-chromosomal markers past present and future Forensic Sci Int Genet 193-99

Szibor R Krawczak M Hering S Edelmann J Kuhlisch E and Krause D (2003) Use of X-linked markers for forensic purposes Int J Legal Med 11767-74

Tillmar AO Backstrom G and Montelius K (2009) Genetic variation of 15 autosomal STR loci in a Somali population Forensic Sci Int Genet 4e19-20

Watson JD and Crick FH (1953) The structure of DNA Cold Spring Harb Symp Quant Biol 18123-131

Westen AA Matai AS Laros JF Meiland HC Jasper M de Leeuw WJ de Knijff P and Sijen T (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples Forensic Sci Int Genet 3233-241

Wiegand P and Kleiber M (2001) Less is more--length reduction of STR am-plicons using redesigned primers Int J Legal Med 114285-287

Willuweit S and Roewer L (2007) Y chromosome haplotype reference data-base (YHRD) update Forensic Sci Int Genet 183-87

Wright S (1951) The genetic structure of populations Ann Eugen 15323-354

Zvelebil M (2008) Innovating hunter-gatherers the Mesolithic in the Baltic In Mesolithic Europe (eds) Bailey G and Spikins P Cambridge University Press New York

| 55